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Much  work  has  been  accomplished  in  the  past  on  the  subject  of  parallel  query 
processing  and  optimization  in  parallel  relational  database  systems.  However,  little 
work  on  the  same  subject  has  been  done  in  parallel  object-oriented  database  systems. 
Since  the  object-oriented  view  of  a  database  and  its  processing  are  quite  different  from 
those  of  a  relational  system,  it  can  be  expected  that  techniques  of  parallel  query  pro- 
cessing and  optimization  for  the  latter  can  be  different  from  the  former.  In  this 
dissertation,  we  present  two  parallel  architectures,  a  general  framework  for  parallel 
object-oriented  database  systems,  several  implemented  query  processing  and  opti- 
mization strategies  together  with  some  performance  evaluation  results.  In  this  work, 
multi-wavefront  algorithms  are  used  in  query  processing  to  allow  a  higher  degree 


of  parallelism  than  the  traditional  tree-based  query  processing.  Four  optimization 
strategies,  which  are  designed  specifically  for  the  multi-wavefront  algorithms  and  for 
the  optimization  of  single  as  well  as  multiple  queries,  are  introduced  and  evaluated. 
A  distributed  result  collection  scheme  which  is  designed  to  support  retrieval  queries 
is  also  introduced.  Furthermore,  two  parallel  architectures,  namely,  master-slave 
and  peer-to-peer  architectures  are  compared.  A  comparison  is  also  made  for  two 
data  placement  strategies,  namely,  class-per-node  vertical  partitioning  and  hybrid 
partitioning.  The  query  processing  algorithms,  four  optimization  strategies  and  the 
distributed  result  collection  scheme  have  been  implemented  on  a  parallel  computer 
nCUBE2,  and  the  results  of  a  performance  evaluation  are  presented  in  this  disser- 
tation. The  main  emphases  and  the  intended  contributions  of  this  dissertation  are 
1)  data  partitioning,  parallel  architecture,  query  processing,  query  optimization  and 
result  collection  strategies  suitable  for  parallel  OODBMSs;  2)  the  implementation  of 
these  strategies;  and  3)  the  performance  evaluation  results. 


CHAPTER  1 
INTRODUCTION 

Research  on  parallel  database  systems  began  in  the  early  1970s  when  the  rela- 
tional model  and  relational  database  management  systems  started  to  become  pop- 
ular. Since  then,  a  considerable  amount  of  work  has  been  carried  out  in  paral- 
lel processing  of  relational  databases.  Many  parallel  query  processing  techniques 
and  algorithms,  particularly  for  the  processing  of  the  time-consuming  Join  opera- 
tion [Vald84,  Grae90,  Kits90,  LuH91,  Chen92],  have  been  introduced,  analyzed,  and 
prototyped.  In  recent  years,  OODBMSs  have  become  quite  popular.  Some  fre- 
quent questions  raised  among  researchers  and  practitioners  in  the  database  area  are: 
"What  are  the  major  differences  between  relational  database  processing  and  object- 
oriented  database  processing?",  "Can  parallel  processing  techniques  and  algorithms 
introduced  for  relational  systems  be  directly  applied  to  object-oriented  systems?", 
and  "What  new  or  modified  parallel  techniques  and  algorithms  can  be  introduced  to 
make  the  future  parallel  OODBMSs  more  efficient?". 

From  the  parallel  processing  perspective,  OODBMSs  differ  from  RDBMSs  in 
the  following  two  main  aspects: 

1.  OODBMSs  deal  with  complex  objects  instead  of  normalized  relational  tuples. 
Data  associated  with  a  complex  object,  say,  the  design  of  an  airplane,  can  be  com- 
posed of  thousands  of  object  instances  of  a  large  number  of  classes.   Each  instance 
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may  contain  data  of  complex  types  like  set,  list,  array,  bag,  image,  voice,  etc..  This 
fact  has  two  implications.  First,  a  query  in  an  OODBMS  may  involve  a  large  number 
of  classes.  Traversals  of  multiple  object  classes  for  object  instances  that  satisfy  or 
do  not  satisfy  some  data  conditions  are  frequent  operations.  Support  for  efficient, 
bi-directional  traversals  of  object  instances  as  well  as  the  retrieval  of  them  is  needed 
to  achieve  efficient  query  processing.  Furthermore,  new  parallel  query  optimization 
strategies  will  be  required  to  reduce  the  I/O,  communication,  and  processing  times 
during  these  traversals.  Second,  since  an  instance  of  a  complex  object  may  contain 
much  data,  the  traditional  tuple-oriented  data  access  from  the  secondary  storage  and 
tuple-oriented  query  processing  used  in  relational  DBMSs  may  no  longer  be  suitable. 
Only  parts  of  the  data  associated  with  instances  of  complex  objects  that  are  relevant 
to  a  query  should  be  accessed  instead  of  the  entire  instances  in  order  to  avoid  an 
excessive  I/O  time.  Thus,  different  data  structures  would  be  required  for  efficient 
parallel  processing  of  complex  objects. 

2.  Relational  systems  deal  with  only  the  retrieval,  update,  insertion  and  deletion 
of  data  from  databases.  Further  processing  of  the  retrieved  or  manipulated  data  is 
done  in  application  programs,  and  thus  is  out  of  the  control  of  relational  DBMSs. 
In  this  type  of  system  architecture,  it  makes  sense  to  generate  temporal  relations 
in  different  steps  of  query  processing  since  these  generated  results  are  to  be  either 
retrieved  or  further  manipulated  by  storage  operations.  In  OODBMSs,  in  addition  to 
the  traditional  database  operations,  user-defined  operations  and  their  implementa- 
tions (methods)  are  managed  and  performed  by  the  systems.  Activations  of  methods 


are  done  by  passing  proper  messages  to  object  instances.  It  is  therefore  important 
to  store  methods  close  to  their  applicable  instances  so  that  they  are  readily  avail- 
able when  object  instances  which  satisfy  some  search  conditions  have  been  identified. 
Except  for  the  final  retrieval  of  data  in  a  retrieval  query,  assembling  of  descriptive 
data  (or  attribute  values)  in  object  instances  (which  is  equivalent  to  the  genera- 
tion of  temporary  relations)  should  not  be  carried  out  since  it  involves  the  access  of 
large  quantities  of  data  from  the  secondary  storage  (high  I/O  time),  the  assembly 
of  object  instances  (high  processing  time),  and  the  passing  of  assembled  data  from 
processor  to  processor  (high  communication  time).  Some  of  these  assembled  data  are 
not  applicable  to  the  user-defined  operations  specified  in  many  nonretrieval-oriented 
queries. 

These  differences  and  the  increasing  popularity  of  OODBMSs  have  motivated 
our  research  in  data  partitioning,  query  processing  and  query  optimization  strategies 
for  use  in  parallel  OODBMSs.  In  our  research,  we  propose  a  hybrid  data  partitioning 
strategy  (horizontal  and  vertical  partitioning)  to  achieve  a  higher  scalability  while 
maintaining  a  uniform  representation  of  an  00  database  across  the  processing  nodes. 
In  order  to  achieve  the  partitioned  data  parallelism,  a  global,  logical  query  graph 
is  decomposed  into  many  physical  query  graphs.  This  approach  is  different  from 
the  ones  used  in  other  parallel  systems  [DeWi92,  Grae94]  which  introduced  parallel 
operators  to  bridge  the  gap  between  the  logical  representation  of  a  query  and  the 
physical  allocation  of  data  elements. 


The  parallel  and  asynchronous  multiple  wavefront  algorithms  proposed  by  [Chen95] 
are  used  in  this  research  as  fundamental  query  processing  strategies.  We  have  ex- 
plored the  graph-based  asynchronous  query  model  and  developed  tour  optimization 
strategies  based  on  the  generic  wavefront  algorithms  to  support  both  single  and  multi- 
ple query  processing.  Based  on  the  multiple  wavefront  algorithms,  a  distributed  result 
collection  scheme  has  been  introduced  to  support  retrieval  queries.  These  strategies 
have  been  implemented  and  the  results  of  a  performance  evaluation  of  these  strategies 
are  presented  in  this  dissertation. 


CHAPTER  2 
SURVEY  OF  RELATED  WORK 


Since  the  focus  of  this  dissertation  is  in  parallel  architectures,  data  partitioning, 
query  processing  and  optimization  strategies  for  parallel  OODBMSs,  we  shall  survey 
the  related  works  in  these  areas. 

Data  Partitioning  is  an  important  issue  in  parallel  RDBMSs.  By  partitioning  (or 
declustering)  a  relation  across  several  disks,  the  database  system  can  exploit  the  I/O 
bandwidth  of  the  disks  by  reading  and  writing  data  in  parallel.  Some  of  the  parallel  or 
distributed  databases  concentrated  on  horizontal  partitionings.  There  are  three  basic 
horizontal  partitioning  schemes,  namely,  round-robin,  hash,  and  range  partitioning. 
These  schemes  and  their  merits  have  been  described  in  two  works  [SuSY88,  DeWi92]. 
The  horizontal  partitioning  approach  is  essential  for  parallel  RDBMSs  to  achieve  good 
scalability  and  speedup.  The  vertical  data  partitioning  technique  has  been  proposed 
by  some  other  researchers  [Nava84,  Cope85].  The  same  strategy  has  been  used  in  sev- 
eral parallel  database  projects  [LamH87,  Vald87].  This  vertical  partitioning  technique 
(or  decomposed  storage  model)  has  two  major  advantages  for  storing  the  instances 
of  complex  objects.  First,  it  provides  a  uniform  representation  for  complex  objects. 
Second,  it  can  avoid  an  excessive  amount  of  I/O  required  to  access  large  instances 
during  query  processing. 


Data  partitioning  increases  the  complexity  of  the  query  processing.  In  tra- 
ditional database  systems,  the  query  execution  plan  consists  of  sequential  opera- 
torsfi.e.,  "scan",  "join",  etc.).  Thus,  in  some  research  efforts,  parallel  operators  such 
as  "split",  "merge"  and  "exchange"  are  introduced  to  bridge  the  gap  between  the 
physical  data  representation  and  the  logical  query  execution  plan  [DeWi92,  Grae94]. 
In  our  research,  the  queries  are  optimized  at  the  query  graph  level.  We  directly 
transform  a  query  graph  into  another  query  graph  based  on  the  physical  partitioning 
of  the  instances  of  those  object  classes  that  involved  in  the  query  graph. 

The  use  of  IID  pairs  for  the  bi-directional  traversals  of  object  instances  to  be 
presented  in  this  work  is  similar  to  the  "join  index"  concept  introduced  for  processing 
relational  joins  [Vald87].  However,  join  indices  can  be  established  for  any  relations 
that  are  directly  or  indirectly  associated  through  their  common  attributes  according 
to  the  access  patterns  of  an  application.  The  IID-pairs  of  our  system  are  established 
for  all  base  object  classes  that  are  directly  associated  through  object  references. 

The  traversals  of  object  instances  through  their  associations  are  analogous  to 
join  and  semi-join  operations  in  relational  database  systems.  The  join  operation  is 
one  of  the  fundamental  relational  query  operations  and  is  a  time-consuming  one.  A 
recent  survey  on  the  join  operation  can  be  found  in  Mishra  and  Eich  [Mish92].  The 
parallel  execution  of  join  operations  is  an  accepted  solution  for  achieving  query  pro- 
cessing efficiency  [Vald84,  Grae90,  Kits90].  However,  most  of  the  existing  works  have 
addressed  the  problem  of  performing  a  join  involving  only  two  relations.  Recently, 
some  researchers  have  studied  parallel  execution  strategies  for  multi-way  join  queries 


using  different  query  tree  structures,  such  as  right-deep,  left-deep  and  bushy  tree 
structures  [Schn90,  LuH91,  Hara94].  Others  have  extended  query  optimization  tech- 
niques to  handle  large  and  more  complex  queries  [Swam88,  Ioan90].  The  main  idea 
introduced  in  these  works  is  to  find  the  optimal  join  schedule  or  order.  Several  semi- 
join  strategies  have  also  been  introduced  for  query  processing  in  distributed  database 
systems.  Similar  to  the  join  operation,  most  research  efforts  have  focused  on  the  prob- 
lem of  finding  the  optimal  schedule  or  order  of  semi-join  to  either  reduce  the  number 
of  semi-join  operations  or  the  data  transmission  cost  [Bern81,  YooH89,  Chen91].  In 
these  works  on  joins  and  semi-joins,  a  query  is  first  translated  into  a  tree  structure 
of  relational  operators  and  the  execution  of  the  query  follows  the  structure  from  the 
leaves  to  the  root.  One  of  the  drawbacks  of  the  tree-based  query  processing  approach 
is  that  the  degree  of  parallelism  is  still  limited  by  the  leaves-to-root  order  even  if 
the  pipelining  approach  is  used  for  processing  the  operations.  Furthermore,  these 
works  consider  the  efficient  processing  of  a  single  query.  While  it  is  fairly  well  under- 
stood how  to  achieve  the  optimal  schedule  in  a  single  query,  little  is  known  about  the 
optimal  processing  of  complex,  multiple  queries  in  a  multi-user  environment.  The  re- 
search results  for  single  query  optimization  are  not  always  applicable  to  multi-query 
optimization.  For  example,  most  single-query  processing  techniques  use  the  response 
time  of  each  query  as  the  main  performance  measurement.  The  horizontal  partition- 
ing of  a  large  file  or  relation  is  used  to  exploit  the  intra-query  parallelism,  so  as  to 
reduce  the  response  time  of  a  single  query.  This  approach  fails  to  achieve  a  balance 
between  the  intra-query  parallelism  and  the  inter-query  parallelism.  In  our  work,  we 


8 

try  to  achieve  a  balance  between  these  two  types  of  parallelisms  so  that  the  overall 
response  time  of  a  set  of  queries  can  be  reduced. 

Several  interesting  works  have  dealt  with  parallel  and  non-parallel  processing 
of  00  databases.  Pointer-based  join  techniques  for  both  centralized  and  parallel 
00  databases  have  been  studied  in  two  works  [Shek90,  Lieu93].  In  these  works, 
the  evaluations  only  consider  the  joining  of  two  object  classes.  Some  object-oriented 
database  systems  automatically  convert  OIDs  stored  in  objects  to  memory  pointers 
to  other  objects  when  they  load  objects  from  the  secondary  storage  to  memory. 
This  conversion  is  known  as  pointer  swizzling  [KimW88,  Whit92].  Pointer  swizzling 
makes  possible  efficient  navigation  of  linked  objects  residing  in  memory.  However,  it 
heavily  depends  on  the  virtual  memory  mechanism  of  the  operating  system.  Database 
systems  using  pointer  swizzling  techniques  may  face  difficulties  when  they  are  ported 
from  one  platform  to  another.  Also,  pointer  swizzling  may  not  be  applicable  to  the 
share-nothing  parallel  computer  in  which  there  is  no  global  memory  space.  Class 
traversals  have  been  proposed  to  find  the  join  order  of  the  classes  in  a  query  graph 
to  find  some  associated  objects  [Jenq90,  KimW89a].  Another  work  uses  an  assembly 
operator  to  translate  a  set  of  complex  objects  from  their  disk  representations  to 
memory  representations  which  can  be  quickly  traversed  [Kell91].  However,  these 
works  are  patterned  after  relational  query  processing  techniques  by  translating  a 
query  graph  into  a  tree  structure.  This  tree-based  approach  implies  a  pair-by-pair, 
bottom-up  evaluation  of  the  query  tree  which  limits  the  inter-operator  parallelism 
and  can  lead  to  the  generation  of  large  intermediate  results.    To  overcome  these 


drawbacks,  we  use  a  graph-based  query  processing  technique.  It  allows  either  all  the 
processors  or  many  processors  that  manage  those  object  classes  referenced  by  a  query 
to  work  on  the  query  at  the  same  time,  thus  achieving  a  higher  degree  of  parallelism. 
Recognizing  that  keeping  many  processors  busy  does  not  necessarily  bring  about  an 
overall  efficiency  in  multiple  query  processing,  we  also  introduce  several  optimization 
strategies  to  avoid  nonproductive  computations  by  some  processors  so  that  they  can 
be  used  to  process  other  queries. 

In  OODBMSs,  the  encapsulation  of  methods  with  the  data  they  operate  on 
makes  the  query  optimization  more  difficulty  in  the  following  ways.  First,  estimating 
the  cost  of  executing  methods  is  considerably  more  difficult.  Second,  encapsulation 
raises  issues  related  to  the  accessibility  of  storage  information  by  the  query  optimizer. 
Some  systems  overcome  this  difficulty  by  treating  the  query  optimizer  as  a  special 
application  which  can  violate  encapsulation  and  access  information  directly  [Clue92]. 
Others  propose  a  mechanism  whereby  objects  "reveal"  their  costs  as  part  of  their 
interface  [Grae88].  In  our  research,  a  heuristic  approach  is  used  which  takes  into 
consideration  some  limited  storage  access  information.  Thus,  we  assume  that  the 
query  optimizer  can  access  storage  information  as  a  special  application. 

The  need  for  parallel  processing  of  data  and  their  complex  relationships  has 
been  recognized  [BicL86,  BicL89,  DeWi90,  KimK90].  The  work  by  DeWitt,  et.  al. 
analyzes  three  distributed  workstation-server  architectures  (namely,  object,  page  and 
file  servers)  for  efficient  processing  of  queries  based  on  an  00  data  model.  This 
work  varies  the  degree  of  data  clustering  and  the  buffer  size  in  their  analysis  of  the 
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performance  of  these  three  architectures.  It  does  not  investigate  parallel  architectures 
and  algorithms  for  processing  and  optimizing  00  queries.  The  AGM  system  [BicL86, 
BicL89]  represents  and  processes  a  database  as  a  network  of  interrelated  entities 
and  relationships  modeled  by  the  ER  model.  An  asynchronous  approach  is  used  to 
process  queries.  Our  work  uses  a  similar  data  representation  and  processing  approach. 
However,  the  granularity  of  computation  in  AGM  is  at  the  data  element  level.  In 
our  opinion,  this  is  not  very  suitable  for  processing  very  large  00  databases  since 
a  large  number  of  tokens  carrying  a  substantial  amount  of  data  would  have  to  be 
generated,  transmitted  and  processed.  Also,  the  result  of  a  query  in  AGM  is  not 
represented  structurally  in  the  same  model  as  the  original  database,  thus  it  can  not 
be  further  operated  on  by  the  same  query  model  (i.e.,  the  closure  property  is  not 
maintained).  The  work  presented  in  Kim's  paper  [KimK90]  analyzes  three  types  of 
parallelism  in  processing  00  queries  (namely,  node  parallelism,  path  parallelism  and 
class-hierarchy  parallelism).  They  are  also  exploited  in  our  work.  However,  Kim's 
work  took  the  analytical  approach  and  only  considers  queries  which  access  the  object 
instances  of  a  single  target  class.  In  our  work,  parallel  algorithms  are  implemented 
to  process  multiple  queries  which  access  object  instances  of  multiple  target  classes, 
their  interrelationships,  and  their  attribute  values. 


CHAPTER  3 
OBJECT-ORIENTED  DATABASE  AND  QUERY  SPECIFICATION 

In  this  section,  we  describe  an  00  view  of  a  database  and  a  graph-based  query 
specification  and  processing. 

3.1     Object-oriented  View  of  a  Database 

An  object-oriented  database  (OODB)  can  be  viewed  as  a  collection  of  objects, 
grouped  together  in  classes  and  interrelated  through  various  types  of  associations 
[SuSY89,  KimW89b,  Well92,  Ishi93,  Bham93].  It  can  be  represented  by  graphs  at 
both  the  intensional  and  the  extensional  levels.  At  the  intensional  (schema)  level,  a 
database  is  defined  by  a  collection  of  inter-related  object  classes  in  form  of  a  Schema 
Graph  (SG).  Figure  3.1  shows  the  SG  of  a  university  database.  Rectangle  vertices 
represent  entity  classes  and  circle  vertices  represent  domain  classes.  Objects  in  entity 
classes  are  entities  of  interest  in  an  application  domain.  Each  object  has  a  system- 
assigned  unique  object  identifier(OID).  Objects  in  a  domain  class  serve  as  values(e.g., 
integer  10,  character  string  "algorithm")  for  defining  other  entity  or  complex  domain 
class  objects.  The  associations  among  classes  are  represented  by  the  edges  in  SG. 
For  example,  the  association  between  Course  and  Department  is  represented  by  an 
attribute-domain  link  (a  fine  line),  and  the  association  between  Person  and  Student 
is  represented  by  a  superclass-subclass  link  (a  bold  arrow).  At  the  extensional  (object 
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instance)  level,  a  database  can  be  viewed  as  a  network  of  object  instances  in  different 
classes,  and  inter-related  through  their  associations.  This  can  be  represented  as  an 
Object  Graph(OG).  Figure  3.2  shows  an  OG  of  a  portion  of  the  university  database. 
Every  object  instance  on  this  graph  is  assigned  by  the  system  an  instance  identifier 
(IID)  which  is  the  concatenation  of  an  OID  and  a  class  ID.  Symbols  such  as  rl, 
g2,  si,  etc.,  instead  of  numbers  are  used  as  IIDs  for  ease  of  reference.  Links  in 
the  figure  show  the  bi-directional  references  between  object  instances.  We  note  here 
that  object  instances  are  the  data  representations  of  objects  in  their  classes.  In  this 
example  database,  we  assume  that  the  distributed  or  dynamic  model  of  inheritance  is 
used,  in  which  data  associated  with  an  object  are  distributed  in  the  object  classes  of  a 
class  lattice  instead  of  the  centralized  or  static  model,  in  which  all  its  data  are  stored 
in  a  bottom  node  of  the  class  lattice.  The  former  model  achieves  the  inheritance  of 
attributes  and  methods  at  run-time  whereas  the  latter  model  at  compilation-time. 
The  query  processing  and  optimization  strategies  presented  in  this  dissertation  are 
applicable  to  both  inheritance  models. 

3.2     Query  Graph  and  Query  Processing 

Based  on  the  above  graphical  model  of  OODBs,  an  object-oriented  query  lan- 
guage called  OQL  has  been  introduced  [Alas89]  in  which  a  query  can  be  specified  by 
a  query  graph.  A  query  graph  is  a  subgraph  of  the  schema  graph  and  consists  of  a 
linear,  tree  or  network  structure  of  object  classes  having  association  operators,  non- 
association  operators  and  AND-OR  branches.   For  example,  the  query  "For  all  the 
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graduate  research  assistants,  find  their  GPAs,  numbers  of  hours  of  the  their  appoint- 
ments, department  names,  and  the  section  numbers  of  the  courses  they  are  taking" 
can  be  written  using  the  object  query  language  (OQL)  as: 

context  RA*Grad*Student  AND  (*Section, 

*Department) 
retrieve  Student.gpa,  RA. hours, 

Department. name,  Section. num 

The  context  part  of  the  query  specifies  the  query  graph  shown  in  Figure  3.3.  In 
the  query  graph,  a  vertex  represents  a  class,  and  an  edge  with  an  association  operator 
"*"  specifies  that  only  those  instances  of  two  adjacent  classes  that  are  associated  with 
each  other  in  the  extensional  database  are  of  interest  to  the  query.  If  a  non-association 
operator  "!"  is  used,  only  those  instances  that  are  not  associated  with  each  other  will 
be  identified.  The  AND  branch  states  that  an  instance  of  the  class  Student  must 
be  associated  with  some  instances  in  both  classes  Section  and  Department.  An  OR 
branch  would  specify  the  OR  condition  of  object  associations.  Range  variables  can 
be  specified  for  the  classes  referenced  in  the  context  statement. 

The  processing  result  of  this  query  graph  is  shown  in  Figure  3.4  which  is  a 
subgraph  of  Figure  3.2.  After  having  identified  object  instances  in  the  multiple  classes 
which  satisfy  the  context  specification,  system-  or  user-defined  operations  specified 
in  the  query  can  then  be  performed  on  these  instances.  In  this  example,  a  retrieval 
operation  is  performed  to  obtain  the  hours,  the  GPAs,  the  department  names,  and 
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Figure  3.1.  Schema  Graph  of  a  University  Database 
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Department 
Figure  3.2.  Object  Graph(OG) 

the  section  numbers.  In  a  more  complex  query,  if  attribute  comparisons  are  involved, 
they  are  specified  in  a  WHERE  subclause  of  the  context  statement  with  quantifiers 
and  complex  predicates.  If  there  are  multiple  links  (or  attributes)  associated  with  two 
classes,  the  link  or  attribute  name  is  given  after  the  "*"  or  "!"  operator  to  identify 
the  specific  link. 

Since  a  query  graph  can  be  very  complex  structurally  and  graph  searches  have 
to  be  carried  out  in  a  potentially  very  large  extensional  database,  the  processing  of 
such  an  00  query  can  be  very  time-consuming.  For  example,  if  the  relational  query 
processing  approach  of  generating  temporary  relations  is  adopted  for  the  00  query 
processing,  complex  data  structures  will  have  to  be  established  and  maintained  in 
each  step  of  the  association/non-association  operation  to  construct  the  aggregated 
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instances  (i.e.,  similar  to  relational  Joins)  and  these  data  of  complex  data  type  will 
have  to  be  passed  from  one  processor  to  another  in  a  multi-processor  computing 
environment  to  perform  object  traversals.  Furthermore,  the  aggregated  instances  do 
not  belong  to  any  predefined  classes.  They  can  not  be  further  processed  by  pre- 
defined methods  due  to  type  checking  problems.  They  may  contain  data  which  are 
not  relevant  to  the  operations  of  any  user-defined  operations.  A  better  way  to  identify 
object  instances  that  satisfy  the  context  specification  is  to  retrieve  the  proper  part 
of  the  object  graph  (i.e.,  the  extensional  database)  from  the  secondary  storage  to 
the  main  memory,  traverse  the  in-memory  structure  to  mark  the  proper  instances  for 
subsequent  processing  instead  of  forming  temporary  data  structures.  The  original 
structural  properties  of  these  object  instances  are  maintained  in  the  object  graph. 
However,  this  method  would  require  bi-directional  traversals  of  object  instances  since 
a  disqualification  of  an  object  instance  would  cause  the  disqualifications  of  many 
other  associated  instances,  thus  causing  backward  propagation  of  IIDs.  Bidirectional 
traversals  need  to  be  supported  by  an  efficient  query  processing  strategy  and  graph- 
based  traversal  algorithms. 

In  this  work,  a  two-phase  query  processing  strategy  [LamH89,  Thak90]  is  adopted 
to  access  and  manipulate  an  OODB.  In  the  first  phase,  multi-wavefront  algorithms 
(see  the  next  section)  are  applied  to  identify  the  object  instances  that  satisfy  the 
context  specification.  Local  selection  conditions,  if  they  are  specified  in  the  Where 
subclause,  are  applied  by  the  involved  processors  to  their  classes  in  this  phase.  In  the 
second  phase,  system-  and/or  user-defined  operations  are  executed  on  these  object 


17 

instances.  Since  the  retrieval  of  descriptive  data  to  form  the  final  retrieval  result 
is  postponed  until  the  second  phase  when  all  the  instances  that  satisfy  the  context 
specification  have  been  identified,  this  processing  strategy  reduces  the  I/O  time  and 
avoids  the  generation  of  large  temporary  instances  during  object  instance  traversals. 
Another  advantage  of  this  approach  is  that  the  original  structural  properties  of  these 
instances  are  preserved  and  can  be  used  in  the  system-  and  user-defined  operations 
in  the  second  phase.  Thus,  the  closure  property  is  preserved. 


Department 


Figure  3.3.  The  Query  Graph 
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Figure  3.4.  The  Resulting  Subdatabase 


CHAPTER  4 
A  GENERAL  FRAMEWORK  OF  PARALLEL  OODBMSS 


4.1     A  Hybrid  Data  Partitioning  Approach 

Data  partitioning  and  placement  is  an  important  issue  in  parallel  database  sys- 
tems since  it  affects  system  performance.  In  OODBMSs,  we  believe  that  there  is  a 
need  for  a  hybrid  data  partitioning  strategy(a  combination  of  vertical  partitioning 
and  horizontal  partitioning).  In  vertical  partitioning,  instances  of  a  class  are  verti- 
cally partitioned  as  illustrated  by  Figure  4.1.  There  are  two  types  of  partitions:  (a) 
data  values  stored  in  IID-data  value  pairs,  and  (b)  instance  cross-references  stored  in 
IID-IID  pairs. 


Student 
IID 

GPA 

Student 

m> 

Grad 
IID 

Student 
IID 

Section 
IID 

Student 
IID 

Dept 
IID 

si 

3.5 

si 

gl 

si 

sel.sc2 

si 

dl 

s2 

3.6 

s2 

s2 

sel,se2 

s2 

dl 

s3 

3.5 

S3 

R2 

s3 

s3 

d2 

s4 

3.3 

54 

*3 

s4 

sc2 

s4 

d2 

s5 

3.6 

s5 

s5 

se2 

s5 

d2 

• 
• 
• 

• 
• 
• 

• 
• 
• 

• 
• 
• 

• 
• 
• 

• 
• 
• 

• 

• 
• 

• 
• 
• 

Figure  4.1.  Vertical  Partitioning  of  Class  Student 

Vertical  partitioning  of  data  improves  I/O  parallelism  and  avoids  the  retrieval 
of  data  not  needed  by  the  query.  By  storing  the  attributes  columns  in  different 
files,  different  queries  which  access  different  attribute  values  of  the  same  set  of  object 


19 


20 

instances  can  be  carried  out  concurrently.  In  other  words,  the  intra-object  parallelism 
can  be  exploited.  Also,  when  the  data  of  an  object  is  to  be  retrieved,  only  those 
needed  attribute  values  need  to  be  accessed  from  the  secondary  storage  instead  of 
all  the  values  that  form  an  object  instance.  This  saving  can  be  very  significant  for 
complex  objects  because  their  attribute  values  can  be  data  of  complex  data  types 
such  as  video  and  audio.  Vertical  partitioning  also  provide  a  simple  and  uniform 
representation  for  complex  objects  in  that  cross-references  data  (association  between 
instances  of  different  classes)  can  be  represented  in  the  same  structure  as  the  attribute 
values  of  objects. 

Intuitively,  the  vertical  partitioning  approach  works  well  if  the  following  two 
conditions  exist.  First,  the  number  of  object  classes  is  greater  than  that  of  processing 
nodes  of  a  parallel  computer  and  the  sizes  of  object  classes  are  about  the  same. 
Second,  queries  issued  against  the  database  access  the  object  classes  with  about 
the  same  probability.  However,  these  conditions  are  not  always  true  in  real  world 
applications.  It  is  possible  that  in  a  database  schema,  there  are  some  very  large  classes 
and  they  are  accessed  by  queries  much  more  frequently  than  the  other  classes.  Under 
this  circumstance,  the  vertical  partitioning  strategy  will  not  scale  up  well.  Thus,  the 
horizontal  fragmentation  after  the  vertical  partitioning  (i.e.,  hybrid  partitioning)  shall 
be  used.  Figure  4.2  show  a  hybrid  partitioning  of  the  class  Student.  In  this  example, 
all  the  hybrid  segment  starting  from  object  instance  si  are  mapped  to  processor  node 
1,  all  the  hybrid  segments  starting  from  objects  instance  sn  are  mapped  to  processor 
node  2,  etc..  The  horizontal  partitioning  can  exploits  inter-object  parallelism.  That 
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is,  a  query  can  be  processed  against  the  horizontal  segments  of  vertically  partitioned 
data  concurrently. 
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Figure  4.2.  Hybrid  Partitioning  of  Class  Student 


The  data  structure  used  to  store  the  data  partitions  in  each  node  is  based  on 
the  concept  of  "join  indices."  It  is  designed  to  facilitate  the  bi-directional  traversal 
of  object  instances.  The  data  associated  with  all  instances  of  a  horizontal  segment 
are  partitioned  into  vertical  binary  columns.  There  are  two  types  of  binary  columns: 
IID-IID  pairs  for  storing  inter-object  references  between  two  adjacent  classes  and  IID- 
attribute- value  pairs  for  storing  the  descriptive  data  of  objects.  The  binary  columns 
of  the  first  type  are  pre-sorted  based  on  the  IIDs  through  which  object  references  are 
to  be  accessed.  For  large  object  classes,  the  binary  columns  of  the  second  type  are 
supported  by  the  traditional  indexing  schemes  for  fast  accesses  of  data  values  given 
some  IIDs  and  fast  accesses  of  IIDs  given  some  data  values.  In  Figure  4.4,  some 
data  partitions  (for  simplicity  sake,  no  IID-attribute-value  pairs  are  shown  in  this 
example)  and  methods  defined  in  the  five  object  classes:  RA,  Grad,  Student,  Section 
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and  Department  are  stored  in  processors  PI,  P2,  P3,  P4  and  P5,  respectively.  Each 
processor  which  holds  a  partition  of  a  class  maintains  the  IID-to-IID  references  to 
the  partitions  of  all  its  adjacent  classes.  For  example,  gl  ->  rlj  g2  ->  r2;  g3  ->  r3  are 
stored  in  processor  PI  to  record  the  inter-instance  references  between  RA  and  Grad 
partitions,  and  rl  ->  gl;  r2  ->  g2;  r3  ->  g3  and  si  ->  gl;  s2  ->  ;  s3  ->  g2;  s4  -> 
g3;  s5  ->  ,  are  stored  in  processor  P2  to  record  the  inter-instance  references  between 
Grad  instances  and  RA  and  Student  instances,  respectively.  This  structure  allows 
bi-directional  traversals  of  object  instances  and  can  be  viewed  as  pre-computed  joins 
in  a  relational  database  [Vald87,  BicL86]. 

In  addition  to  the  IID-IID  pairs,  an  integer  array  CON  is  established  for  each 
adjacent  partition  as  shown  in  Figure  4.4.  Each  element  of  the  array  corresponds 
to  one  instance  of  the  partition  stored  in  the  processor,  and  the  integer  value  is 
the  number  of  connections  between  that  instance  and  the  instances  of  an  adjacent 
partition.  For  example,  the  elements  of  array  Section. CON  in  the  Student  partition 
have  values  2,  2,  0,  1,  and  1,  which  specify  the  numbers  of  connections  si,  s2,  s3,  s4, 
and  s5  have  with  the  instances  of  the  Section  partition,  respectively.  These  integer 
arrays  are  used  in  the  multi-wavefront  algorithms  to  be  described  in  the  next  section. 

4.2     A  Distributed  Graph-based  Query  Processing  Strategy 

4.2.1     Query  Graph  Modification  Approach 

Similar  to  SQL,  an  00  query  language  is  a  nonprocedural  language.  Thus,  the 
physical  data  allocation  information  is  transparent  to  the  users  and  is  not  stated  in 
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the  query  language.  In  traditional  RDBMSs,  a  query  is  transformed  into  a  tree  struc- 
tured query  execution  plan  before  it  is  executed  in  a  bottom-up  manner.  In  parallel 
RDBMSs,  some  parallel  operators,  such  as  SPLIT  and  MERGE  are  introduced  to 
the  tree  structure  of  the  query  execution  plan  (QEP)  to  bridge  the  gap  between  the 
logical  representation  of  a  query  and  the  physical  mapping  of  the  data  [DeWi92]. 
The  same  approach  is  also  used  in  some  recent  research  in  OODBMSs  [Grae94].  In 
our  work,  we  propose  a  different  approach  which  modifies  a  logical  query  graph  into 
another  query  graph  based  on  the  physical  mapping  of  the  data.  For  example,  if 
classes  Grad  and  Student  are  horizontally  partitioned  into  Gradl,  Grad2,  Studentl, 
and  Student2  partitions,  respectively,  and  each  of  the  other  class  has  a  single  parti- 
tion, then  the  query  will  have  to  be  processed  against  the  four  combinations  of  data 
partitions  to  obtain  the  final  result  (i.e.,  Gradl,  Studentl,  and  the  partitions  of  all 
other  classes  form  a  combination,  etc.).  The  query  graph  shown  in  Figure  3.3  can  be 
transformed  into  a  query  graph  as  shown  in  Figure  4.3.  From  this  figure,  one  can  see 
that  the  horizontal  parallelism  can  be  captured  by  the  "OR"  branches.  Although  the 
algorithms  for  implementing  "OR"  and  "AND"  branches  achieve  the  similar  things 
as  "Split",  "Merge"  or  "Exchange"  operators,  the  main  differences  between  the  query 
modification  approach  and  the  parallel  operator  approach  are:  1)  "OR"  and  "AND" 
branches  are  defined  in  our  original  data  query  model.  They  are  not  operators  intro- 
duced specifically  for  parallel  platform;  2)  Processing  of  the  partitioned  data  in  the 
former  approach  can  take  advantages  of  the  graph-based,  multi-wavefront  algorithms 
as  well  as  graph-based  optimization  strategies. 
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Figure  4.3.  Query  Graph  Modification  Approach 
4.2.2     Multiple  Wavefront  Algorithms 

The  processing  of  each  query  graph  against  a  combination  of  data  partitioning 
is  based  on  two  multiple  wavefront  algorithms  introduced  in  our  previous  work:  the 
identification  approach  and  the  elimination  approach.  In  both  algorithms,  data  rel- 
evant to  the  processing  of  a  query  graph  are  retrieved  from  the  secondary  storage 
devices  in  parallel  and  manipulated  in  main  memories  by  the  processors  which  hold 
the  partitions  of  object  classes  referenced  in  the  query.  In  the  identification  approach, 
partitions  referenced  by  a  query  are  classified  into  two  types.  Partitions  with  more 
than  one  "AND"  conditioned  edge  in  the  query  graph  are  called  non-terminal  parti- 
tions; otherwise,  they  are  called  terminal  partitions.  Query  processing  starts  at  all  the 
processors  that  manage  the  terminal  partitions.  These  processors  do  their  local  selec- 
tions of  instances  (if  selection  conditions  are  given  in  the  query),  look  up  the  proper 
binary  columns  of  IID-IID  pairs,  and  send  out  the  IIDs  of  the  associated  instances 
of  its  only  neighboring  class  which  satisfy  the  selection  and  instance  reference  con- 
ditions. Each  propagation  of  IIDs  forms  a  wavefront  moving  toward  all  other  nodes 
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of  the  query  graph.  Multiple  wavefronts  go  across  one  another  in  an  asynchronous 
fashion  and  the  operations  of  all  the  processors  depend  on  the  operators  ("*"  or  "!") 
and  branch  conditions  (AND  or  OR)  given  in  the  query.  The  behaviors  and  termina- 
tion conditions  of  processors  that  contain  terminal  and  non-terminal  partitions  are 
as  follows.  Each  node  will  send  out  an  end_marker  to  one  of  its  neighboring  nodes 
immediately  after  it  sends  a  wavefront  to  that  node.  A  node  will  terminate  if  the 
number  of  end_markers  it  receives  is  equal  to  the  number  of  edges  it  has.  The  pro- 
cessor that  contains  a  non-terminal  partition  would  receive  streams  of  IIDs  from  all 
its  "OR"  conditioned  neighbors  and  all  its  "AND"  conditioned  neighbors  but  one. 
It  will  process  those  streams  of  IIDs  and  select  the  instances  that  satisfy  the  local 
selection  condition  and  the  "AND"  and  "OR"  branch  condition.  Then,  it  will  send 
the  IIDs  of  the  associated  instances  of  the  only  remaining  neighboring  partition  to 
its  corresponding  processor.  The  processor  of  a  non-terminal  partition  would  per- 
form its  local  selection  and  process  the  incoming  streams  of  IIDs.  When  it  receives 
the  last  incoming  streams  of  IIDs,  it  will  process  them,  and  pass  the  IIDs  of  those 
associated  instances  of  all  other  neighbors  to  these  neighbors  except  the  sender  of 
the  last  incoming  IID  stream.  Figure  4.5  illustrates  the  execution  of  the  query  given 
in  Figure  reffq  .  In  this  algorithm,  the  processing  starts  from  all  terminal  nodes  and 
each  processor  reports  to  its  neighbor(s)  the  instances  that  satisfy  the  search.  A 
terminal  node  terminates  its  processing  after  it  received  an  end  .marker  from  its  only 
neighbor.  We  note  that  all  the  "OR"  conditioned  edges  can  be  treated  as  one  edge. 
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Therefore,  Figure  4.3  is  not  a  cyclic  graph.  The  processing  of  a  cyclic  graph  can  be 

found  in  previous  work  [Chen95]. 
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Figure  4.4.  Data  Structures  for  the  Multiple  Wavefront  Algorithms 


In  contrast  with  the  identification  algorithm,  the  elimination  algorithm  elimi- 
nates object  instances  that  do  not  satisfy  the  context  specification.  When  an  object 
instance  in  the  in-memory  object  graph  is  eliminated  in  the  query  processing,  all  the 
associated  instances  of  the  neighboring  partitions  will  have  to  be  eliminated.  This 
may  in  turn  cause  their  associated  instances  to  be  eliminated.  Thus,  the  elimina- 
tion process  will  be  repeated  until  all  the  unqualified  instances  have  been  eliminated. 
In  this  algorithm,  all  processors  become  active  after  receiving  the  query  graph,  and 
they  can  start  processing  local  instances  (e.g.,  do  local  selections  and  check  instance 
connectivities)  without  waiting  for  the  waves  of  IIDs  from  the  neighboring  processors 
(classes).  For  this  reason,  the  elimination  algorithm  achieves  a  higher  degree  of  par- 
allelism than  the  identification  algorithm  (in  the  case  of  the  AND  branch  condition). 
In  this  algorithm,  each  processor  reports  to  its  neighbor(s)  the  instances  that  have 
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been  eliminated.  The  counts  in  the  proper  integer  arrays  are  decremented.  When 
an  entry  of  an  array  becomes  zero,  the  corresponding  instance  is  eliminated  and  the 
IIDs  of  those  eliminated  instances  are  sent  to  the  neighbors  so  that  they  can  in  turn 
eliminate  their  instances. 

Although  there  is  a  significant  difference  between  these  two  algorithms,  some 
processing  techniques  are  applicable  to  both  of  them.  For  example,  in  both  algo- 
rithms, the  propagation  and  processing  of  IIDs  can  be  carried  out  in  a  pipelining 
fashion,  thus  increasing  the  degree  of  parallelism.  Also,  in  order  for  both  algorithms 
to  know  when  to  terminate,  the  end-marker  is  introduced. 

4.2.3  Result  Collection 

The  above  multiple  wavefront  algorithms  mark  the  IIDs  that  satisfy  the  query 
pattern  and  send  the  IIDs,  attributes  and  all  the  IID-IID  cross-reference  information 
to  a  result  collection  (RC)  node.  Upon  receiving  these  information,  the  RC  node  tra- 
verses the  cross-reference  information  to  reconstruct  the  query  results.  This  approach 
creates  a  potential  bottleneck  in  the  RC  node  (even  though  more  than  one  processing 
node  can  be  allocated  as  the  RC  nodes  for  differently  queries).  An  alternative  is  to 
use  a  distributed  result-collection  approach  which  will  be  discussed  in  Section  5. 

4.2.4  Method  Processing  and  Attribute  Inheritance 

In  OODBMSs,  the  behavioral  properties  of  objects  are  defined  by  method  spec- 
ifications in  object  classes.  Due  to  the  inheritance  property,  all  the  methods  defined 
in  an  object  class  can  be  applied  to  the  instances  of  all  its  subclasses.  Likewise,  the 
object  instances  of  these  subclasses  can  also  inherit  the  attributes  defined  in  their 
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superclasses.  A  challenge  for  the  design  of  a  parallel  OODBMS  is  how  to  map  method 
implementations  and  object  instances  to  processors  in  such  a  way  that  the  following 
two  goals  can  be  reached.  The  first  goal  is  to  place  data  and  their  applicable  method 
implementations  as  close  as  possible  so  that  data  and/or  code  do  not  have  to  be 
moved.  The  second  goal  is  to  make  the  attribute  inheritance  as  efficient  as  possible. 
In  our  approach,  the  following  rules  are  used  to  achieve  the  above  goals. 

•  All  the  methods  applicable  to  an  object  class  are  replicated  in  the  processing 
nodes  to  which  the  instances  of  the  class  are  mapped  following  the  hybrid  data 
partitioning  strategy. 

•  The  mapping  generally  starts  from  the  root  class  of  generalization  hierarchies 
using  the  top  down  approach  until  all  the  classes  are  allocated. 

•  The  instances  of  multiple  classes  in  an  inheritance  hierarchy  or  lattice  which 
hold  the  data  of  the  same  object  are  mapped  to  the  same  processing  node. 

•  The  object  classes  which  have  aggregation  association  with  (attribute  links)  the 
classes  in  the  generalization  hierarchies  can  be  mapped  to  the  processors  after 
all  the  classes  in  generalization  hierarchies  are  allocated  to  achieve  the  load 
balance.  Or,  they  can  be  mapped  randomly.  After  collecting  system  running 
information,  further  adjustment  can  be  done  to  achieve  the  load  balancing. 

Figure  4.6  shows  an  example  of  applying  the  above  rules.  The  object  instances 
of  class  Person  are  mapped  to  PI,  P2,  P3  and  P4.  Thus,  the  methods  defined  by 
class  Person  are  replicated  in  these  four  processing  nodes.    Similarly,  the  methods 
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defined  by  class  Teacher  are  replicated  to  P2  and  P3.  The  numbers  1  to  8  represent 
OIDs.  By  this  mapping,  the  methods  can  be  applied  to  the  data  of  the  object  classes 
concurrently,  thus  improves  the  performance.  At  the  same  time,  there  is  no  separation 
of  the  data  and  their  applicable  methods.  No  transferring  of  the  data  and  methods 
is  required  during  query  execution.  Moreover,  it  reduces  the  unnecessary  replication 
of  methods.  For  example,  the  methods  defined  by  class  Teacher  are  not  replicated 
in  PI  and  P4  because  there  is  no  object  instances  of  class  Teacher  mapped  to  these 
two  nodes.  This  mapping  strategy  also  make  the  attribute  inheritance  very  efficient 
because  all  the  instances  of  multiple  classes  in  an  inheritance  hierarchy  or  lattice 
which  hold  the  data  of  the  same  object  are  mapped  to  the  same  node.  Moreover, 
some  index  structures  can  be  established  for  the  instances  of  the  objects  during  the 
mapping  process  to  speedup  future  accesses. 


Figure  4.6.  A  Proposed  Mapping  Strategy 


CHAPTER  5 
DISTRIBUTED  RESULT  COLLECTION 


5.1     Two  Architectures  for  Supporting  Result  Collection 

Recently,  there  is  a  significant  trend  in  both  industry  and  research  communities 
to  unify  the  relational  and  object-oriented  database  technologies.  One  of  the  chal- 
lenges this  shift  brought  to  query  processing  is  that  both  navigational  and  retrieval 
query  types  should  be  supported. 

Two  architectures  have  been  studied  in  our  research:  master-slave  and  peer-to- 
peer.  In  the  master-slave  architecture,  clients  (users)  submit  queries  to  a  master  node. 
Upon  receiving  queries,  master  node  will  analyze  them,  apply  query  optimization 
strategies  to  them,  modify  them  based  on  the  placement  of  data  and  pass  the  modified 
query  to  various  slave  nodes.  After  slave  nodes  finish  the  query  processing,  slave 
nodes  send  the  partial  results  back  to  the  master  node.  In  turn,  the  master  node 
assembles  the  partial  results  for  each  query  together  and  reports  them  to  clients.  The 
master-slave  architecture  is  shown  in  Figure  5.1(a). 

In  the  peer-to-peer  architecture,  a  client  (user)  can  submit  a  query  to  any  node 
which  analyzes  the  query,  applies  optimization  strategies  to  it,  modifies  it  and  pass 
the  modified  query  to  all  the  nodes  that  contain  the  relevant  data.  Upon  receiving 
a  subquery,  these  nodes  process  the  subqueries  and  the  results  are  collected  by  one 
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of  these  nodes.  In  this  architecture,  the  node  which  receives  queries  from  clients  is 
called  a  coordinator  node  (C-node)  and  the  node  which  processes  the  queries  is  called 
a  peer  node  (P-node).  A  node  in  a  parallel  machine  can  be  a  C-node  and  P-node 
at  the  same  time  and  there  can  be  more  than  one  C-node  in  a  system.  The  peer- 
to-peer  architecture  is  shown  in  Figure  5.1(b).  Our  implementation  assumes  that 
the  parallel  computer  supports  the  client-server  architecture.  The  server  (e.g.,  the 
parallel  computer)  itself  works  in  a  master-slave  or  peer-to-peer  mode. 

In  the  master-slave  architecture,  when  a  retrieval  query  is  executed,  the  IID-IID 
pairs  and  the  attributes  of  the  objects  are  sent  to  the  master  node.  Master  node 
has  to  traverse  the  IIDs  and  construct  the  final  results.  This  process  could  be  time 
consuming  and  the  master  node  can  become  a  potential  bottleneck. 

Based  on  the  peer-to-peer  architecture,  we  introduce  a  distributed  result  collec- 
tion approach  to  ease  the  potential  bottleneck  by  designating  one  node  to  collect  the 
results.  This  approach  is  based  on  a  pattern-passing  identification  (PPI)  strategy 
which  is  a  modification  of  the  multiple  wavefront  identification  algorithm  described 
in  Section  4.2.2. 

5.2     Pattern-passing  Identification  Strategy 

The  main  idea  of  this  strategy  is  to  construct  and  propagate  the  association 
information  of  objects  in  all  the  P-nodes  which  are  involved  in  a  query.  In  a  query 
graph,  we  assume  a  node  C  has  i  numbers  of  "AND"  conditioned  edges  and  j  numbers 
of  "OR"  conditioned  edges.  We  call  the  processor  which  contains  the  instances  of 
node  C,  the  Pc  processor.   If  i  equals  to  1  and  j  equals  0  or  if  i  equals  to  0  and  j  is 
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Figure  5.1.  Two  Architectures 
greater  than  1,  we  call  node  C  a  terminal  node;  if  i  is  greater  than  1,  we  call  node 
C  a  non-terminal  node.    We  show  the  PPI  strategy  by  describing  the  behaviors  of 
terminal  and  non-terminal  nodes  below. 


•  Every  node  shall  send  an  end-marker  to  its  neighboring  node  immediately  after 
it  sends  out  a  wavefront  to  that  node; 

•  If  C  is  a  terminal  node  in  a  query  graph; 

—  Pc  will  start  the  IID  propagation  process; 

-  when  Pc  receives  a  wavefront  of  association  patterns  each  of  which  is 
formed  by  a  concatenation  of  associated  IIDs  from  its  neighbors,  it  will 
concatenate  the  local  IIDs  that  are  associated  with  the  incoming  patterns. 
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If  the  number  of  end-marker  Pc  receives  is  equal  to  the  number  of  the 
edges  it  has,  it  terminates. 

•  If  C  is  a  non-terminal  node  in  a  query  graph; 

—  upon  the  arrival  of  each  incoming  wavefront  of  association  patterns,  it 
will  concatenate  the  incoming  patterns  with  local  IIDs  according  to  the 
IID-IID  pairs  available  in  the  local  memory. 

—  if  Pc  has  not  received  (i-1)  incoming  wavefronts  of  association  patterns 
from  its  "AND"  conditioned  neighboring  processors  and  (j)  incoming  wave- 
fronts  of  association  patterns  from  its  "OR"  conditioned  neighboring  pro- 
cessors, it  will  waits  for  more  wavefronts  to  come; 

—  when  Pc  receives  the  (i-l)th  wavefront,  it  will  perform  the  same  sequence 
of  operations  as  describe  in  the  first  step  and  propagate  the  (i-l)th  inter- 
mediate results  to  the  only  remaining  "AND"  conditioned  neighbor  from 
which  it  has  not  received  a  wavefront; 

—  when  Pc  receives  the  i-th  wavefront  from  its  "AND"  conditioned  neighbor, 
it  will  perform  the  same  sequence  of  operations  as  above,  propagate  the 
final  result  to  all  the  neighbor  processors  except  the  sender  of  the  i-th 
wavefront  and  terminates. 

Figure  5.2  shows  an  example  of  the  above  procedure.  The  query  graph  is  the 
query  pattern  shown  in  Figure  4.3  and  the  object  graph  is  as  shown  in  Figure  3.2. 
In  this  approach,  the  association  pattern  among  the  objects  is  constructed  by  all 
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the  processors  involved  in  a  query  instead  of  by  one  or  more  dedicated  processors. 
In  other  words,  the  pattern  is  constructed  in  a  distributed  fashion.  Notice  that,  at 
the  end  of  the  procedure,  all  nodes  contain  the  same  patterns  of  object  associations 
(could  be  in  a  different  order).  We  shall  consider  how  to  collect  the  results  of  a 
retrieval  query  by  using  these  patterns  below. 

5.3     Distributed  Result  Collection 

In  a  retrieval  query,  such  as  the  one  shown  in  Section  3.2,  the  descriptive  data 
(i.e.,  the  primitive  attribute  values)  of  one  or  more  object  classes  are  retrieved  from 
the  secondary  storage,  concatenated  and  presented  to  the  query  issuer  in  an  appro- 
priate order.  By  taking  the  advantage  of  PPI  strategy,  this  process  can  be  carried 
out  in  a  distributed  and  parallel  fashion.  We  designate  one  node  or  a  set  of  nodes  for 
each  query  as  the  result  collection  node(s)  depending  on  applications.  There  could 
be  more  than  one  criterion  for  selecting  a  result  collection  node.  One  of  the  crite- 
rion is  that  this  node  contains  the  class(es)  from  which  the  size  of  the  descriptive 
data  to  be  retrieved  is  greater  than  the  other  nodes.  In  this  way,  we  can  avoid  the 
transmission  of  the  larger  set  of  data,  thus,  reducing  the  communication  cost.  The 
load  balancing  is  another  criterion  should  be  considered  when  multiple  queries  are 
choosing  the  result  collection  nodes.  Figure  5.3  shows  a  possible  assignment  of  the 
results  collecting  nodes  for  the  modified  query  shown  in  Figure  4.3.  The  highlighted 
nodes  are  the  designated  result  collection  nodes.  Each  result  collection  node  is  re- 
sponsible for  collecting  the  results  in  a  specified  range  (e.g.,  IID  values  or  hashed  IID 
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Figure  5.2.  An  Example  of  the  PPI  Strategy 
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values).  In  the  following  sections,  we  shall  discuss  how  the  other  nodes  retrieve  their 
descriptive  data  and  transfer  them  to  the  data  collection  nodes. 

P2  Gradl  P3  Student  1  ___ 

P4  Section 


P7  Grad2  pg  student2  ^\OR  \    P5  Department 


Figure  5.3.  An  Example  of  the  Result  Collection  Node  Assignment 


5.3.1     Join  Approach 

At  the  end  of  PPI,  all  nodes  involved  in  the  query  have  the  same  pattern  of 
object  associations  (i.e.,  the  same  set  of  associated  IIDs).  An  example  of  the  re- 
sulting patterns  which  involve  only  two  classes  is  shown  in  Figure  5.4.  The  first 
column  always  contains  the  local  IIDs  and  the  second  column  contains  the  IIDs  of 
the  neighboring  node(s).  In  our  implementation,  the  pattern  on  each  site  are  ordered 
according  to  the  order  of  the  local  IIDs.  The  Pa  is  highlighted  to  indicate  that  it  is 
a  result  collection  node. 

Obviously,  one  way  to  combine  the  attribute  values  of  class  B  with  that  of  class 
A  is  for  Pa  and  P<,  to  retrieve  their  attribute  values  from  their  disks,  transfer  Pi's 
attribute  values  which  satisfy  a  specified  condition  (e.g.,  an  IID  value  range  or  a  hash 
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value)  to  Pa,  and  perform  a  join  operation  there.  This  approach  is  very  similar  to  the 
semi-join  approach  used  in  other  distributed  systems.  In  this  approach,  the  attribute 
value  of  each  IID  is  retrieved  and  transferred  only  once.  However,  the  join  operation 
at  Pa  could  be  very  costly.  If  the  attributes  of  class  B  can  not  be  held  in  the  memory, 
then  they  have  to  be  stored  in  the  secondary  storage  of  Pa,  thus,  requiring  a  lot  of 
I/O  operations  and  introducing  a  bottleneck  at  Pa.  This  problem  will  be  even  more 
serious  if  a  join  operation  involves  more  than  one  neighboring  class. 
Pa  Pb 
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Figure  5.4.  An  Example  of  the  Result  Pattern 

5.3.2     Concatenation  Approach 

In  this  approach,  we  re-order  the  patterns  at  Pi,  in  the  same  order  as  that  of 
Pa  (i.e.,  based  on  the  IIDs  in  the  second  column).  In  this  way,  when  the  attribute 
values  of  class  B  that  satisfy  a  specified  condition  are  retrieved  and  transferred  to  Pa, 
they  will  be  in  the  same  order  as  the  attribute  values  of  class  A.  Therefore,  the  final 
results  can  be  obtained  by  simply  concatenating  the  two  streams  of  attribute  values. 
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This  scheme  works  well  except  that  some  of  the  disk  pages  which  contain  certain 
IIDs  might  have  to  be  read  more  than  once.  For  example,  in  Figure  5.4,  the  page 
contains  IID  4  might  have  to  be  read  from  disk  twice;  the  page  contains  IID  14  might 
have  to  be  read  three  times.  The  multiple  reading  of  the  same  page  is  due  to  the 
unordered  local  IIDs  in  the  first  column  of  Pi,.  Some  researchers  addressed  this  issue 
by  assuming  a  physical  OID  [Shek90,  Lieu93]  (i.e.,  each  OID  contains  information 
about  the  node  and  disk  page  of  the  referenced  object).  This  approach  sorts  OIDs  by 
their  page  IDs  before  the  disk  access,  thus,  the  multiple  accesses  of  the  same  object 
are  avoided.  In  our  research,  we  assume  logical  IIDs  are  used.  Therefore,  a  different 
approach  (object  cache)  is  taken  to  reduce  the  penalty  of  multiple  retrievals  of  the 
same  page.  The  following  steps  are  used  in  our  approach: 

•  Allocate  a  certain  size  of  the  memory  as  a  cache  area.  The  size  of  this  area 
depends  on  the  size  of  the  available  memory. 

•  Count  the  number  of  appearances  of  each  local  IID  in  the  patterns.  Store  this 
number  together  with  each  IID  and  denote  it  as  the  TotaLCount.  Because  the 
patterns  are  sorted  by  the  local  IIDs,  the  complexity  of  counting  is  O(n);  n  is 
the  number  of  patterns. 

•  Sort  the  patterns  so  that  they  are  in  the  same  order  as  the  patterns  in  the  result 
collection  node.  The  complexity  of  the  sorting  is  nlog(n). 

•  A  data  structure  shown  in  Figure  5.5  is  maintained  in  the  cache  area  of  the 
memory.    This  structure  is  used  to  log  the  attribute  values  of  a  set  of  most 
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frequently  appeared  IIDs.  We  denote  this  structure  as  A-Cache  (A  stands  for 
"attribute").  For  the  simplicity  of  presentation,  we  use  a  flat  table  structure. 
In  an  implementation,  other  data  structures  such  as  a  hash  table  can  be  used. 

•  Those  attribute  values  of  the  local  IIDs  needed  for  a  retrieval  query  are  retrieved 
either  from  the  secondary  storage  or  from  A.Cache  in  the  following  manner.  For 
each  IIDi, 

-  if  TotaLCounti  is  equal  to  one,  the  attribute  values  of  the  IIDj  will  be 
retrieved  from  the  disk; 

-  if  TotaLCounti  is  greater  than  one,  the  A.Cache  will  be  checked.  If  II £), 
is  already  in  it,  the  attribute  values  stored  with  IIDi  are  accessed  from  the 
A_Cache  instead  of  from  the  disk  and  paired  with  II Di  being  processed, 
and  the  CurrentjCounti  is  decremented  by  one; 

-  if  IID{  can  not  be  found  in  the  A.Cache,  the  attribute  values  of  IIDi 
are  retrieved  from  the  secondary  storage.  If  A.Cache  has  an  empty  entry, 
the  attribute  values  of  I  ID;  is  logged  in  A.Cache  and  Current  JOountj  = 
TotaLCounti  —  1.  If  A.Cache  is  full,  the  table  entry  with  the  smallest 
Current-Count  will  be  replaced  by  IIDi  and  its  attribute  values  if  the 
smallest  Current-count  is  less  than  Currentjcounti. 

The  above  approach  avoids  the  repeated  disk  accesses  of  attribute  values  asso- 
ciated with  the  set  of  IIDs  with  larger  counts  maintained  in  the  A.Cache.  However, 
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object  cache  approach  increases  the  complexity  of  maintaining  the  consistency  of 
OODBMSs. 


UD 

Current  Count 

Attributes 

Figure  5.5.  An  Example  of  a  cache 


CHAPTER  6 
QUERY  OPTIMIZATION  STRATEGIES 

The  multiple  wavefront  algorithms  have  been  designed  to  achieve  a  high  degree 
of  parallelism.  However,  a  high  degree  of  parallelism  does  not  necessarily  guarantee 
the  maximal  efficiency,  since,  as  we  shall  explain,  processors  can  be  kept  busy  doing 
nonproductive  work.  Furthermore,  in  the  situation  of  multiple  queries,  there  is  a  large 
number  of  queries  to  be  processed  simultaneously.  It  is  more  desirable  to  allocate 
some  nodes  to  process  other  queries  than  to  let  them  be  committed  to  one  particular 
query  and  do  nonproductive  computations.  Let  us  look  closer  into  this  problem  and 
its  possible  solutions  from  both  single  query  and  multiple  query  points  of  view. 

6.1     Parallelism  Not  Equal  to  Efficiency 

We  use  an  example  to  illustrates  the  problem.  Our  discussion  is  still  based  on 
the  schema  graph  of  Figure  3.1.  We  assume  that  there  are  300  RAs  at  a  university 
with  a  student  body  size  of  10,000  and  a  graduate  student  body  size  of  4,000.  Both 
rl  and  r2  have  20-hour  appointments,  and  the  rest  of  RAs'  appointments  are  either 
10  hours  or  15  hours,  si's  GPA  is  3.5  and  there  are  4,000  students  with  a  GPA  of 
3.5.  s2's  GPA  is  3.6  and  there  are  800  students  with  a  GPA  of  3.6.  Finally,  there 
are  10  departments  at  the  university,  and  together  they  offer  300  sessions  of  courses. 
The  object  graph  is  shown  in  Figure  6.1.  For  simplicity  reason,  we  map  each  object 


42 


43 


class  to  a  processor  node.  However,  if  horizontal  partition  is  done  to  any  object  class, 
Figure  6.1  could  be  considered  as  the  object  graph  for  a  specific  combination  of  data 
partitions.  Therefore,  in  the  rest  of  this  section,  the  terms  "class"  and  "partition" 
are  interchangeable. 

Now  the  query  is  "Find  the  students  who  are  RAs  with  a  20-hour  appointment 
and  3.5  GPA,  and  also  find  those  sections  that  the  students  are  taking.  Retrieve  their 
names,  their  GRE  scores,  and  their  department  names." 

It  can  be  written  in  OQL  as  below: 

context  RA*Grad*Student  AND  (*Section,  *Department) 

where  RA.hrs  =  20  A  Student.gpa  =  3.5 
retrieve  Student. name,  Grad.gre,  Department. name 

If  the  identification  approach  is  used,  as  shown  in  Figure  6.2,  P3  which  processes 
the  Student  class  would  most  likely  receive  the  IIDs  propagated  from  the  Section  and 
Department  classes  before  it  receives  the  stream  of  IIDs  propagated  from  the  RA 
class.  If  most  of  the  object  instances  in  the  Section  and  Department  classes  are 
connected  with  the  object  instances  in  the  Student  class,  after  applying  the  local 
selection  condition  and  processing  the  two  incoming  wavefronts,  P3  will  send  4,000 
IIDs  to  P2.  Also,  P2  will  have  to  take  a  considerable  amount  of  time  to  process  these 
IIDs,  most  of  which  do  not  contribute  to  the  end  result.  But  based  on  our  assumed 
object  graph,  we  can  see  that  the  local  selections  of  the  RA,  Department  and  Section 
classes  produce  very  few  IIDs.  If  the  processor,  which  stores  and  processes  the  Student 
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class,  simply  waits  for  all  the  wavefronts  (including  the  one  propagated  from  PI  and 
P2)  to  come  from  its  neighboring  classes,  it  will  only  send  out  a  very  limited  number 
of  IIDs  to  its  neighboring  classes.  The  communication  bandwidth  between  P2  and 
P3  as  well  as  the  CPU  time  of  P2  can  thus  be  saved  and  be  used  for  processing  other 
concurrent  queries.  Also,  based  on  the  data  specified  in  the  retrieval  statement  of  the 
query,  the  attribute  values  associated  with  RA  and  Section  classes  are  not  needed. 
Therefore,  the  wavefront  propagations  from  P3  (the  Student  class)  towards  P4  (the 
Section  class)  and  from  P2  (the  Grad  class)  towards  Pl(the  RA  class)  are  not  needed. 
The  algorithm  can  terminate  at  step  4  of  Figure  6.2. 


Department 


Figure  6.1.  A  Modified  Object  Graph 


From  the  above  example,  we  can  see  that  a  number  of  optimization  strategies 
can  be  introduced  for  the  graph-based  object-oriented  query  processing  in  a  parallel 
environment.  By  starting  a  query  at  some  selected  nodes  and/or  by  controlling  the 
directions  and  the  extend  to  which  a  wavefront  of  IIDs  propagates,  we  can  not  only 
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reduce  the  response  time  of  an  individual  query  but  also  the  overall  processing  time 
of  concurrent  queries  since  nonproductive  computation  can  be  avoided.  In  the  next 
subsection,  four  strategies  for  query  optimization  in  multiple  wavefront  algorithms 
are  introduced.  They  aim  to  avoid  excessive  or  unnecessary  IID  transfers  between 
nodes  while  maintaining  an  appropriate  degree  of  parallelism  in  order  to  free  some 

processors  from  computations  that  do  not  contribute  to  the  end  results  of  queries. 

P4 
sel..se3MU 


(d)  step  4 
Figure  6.2.  A  New  Query  Execution  Plan 
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These  optimization  strategies  will  work  if  the  system  can  pre-determine  or  pre- 
estimate  the  number  of  instances  of  each  class  which  may  satisfy  a  query.  Fortunately, 
several  researchers  have  done  interesting  work  in  relational  DBMSs  to  estimate  the 
size  of  the  outcome  of  selection  and  join  operations  [Ling92,  SunW93] .  The  introduced 
size  estimation  techniques  that  do  not  generate  much  run-time  overhead  are  very 
suitable  for  our  application.  The  CON  array  and  connection  information  (i.e.,  IID- 
IID  pairs)  shown  in  Figure  4.4  is  also  very  useful  for  estimating  the  distinct  number 
of  IIDs  to  be  sent  to  neighboring  nodes  after  a  local  selection.  A  well-designed  query 
optimizer  shall  be  able  to  collect  this  information  periodically  and  use  it  to  establish 
efficient  execution  plans. 

Before  we  present  the  optimization  strategies,  we  define  a  couple  of  parameters 
which  are  used  to  characterize  a  query  graph. 

IID-size:  IID-size  is  the  estimated  number  of  the  distinct  IIDs  to  be  sent 
to  a  neighboring  node  after  its  local  processing  (i.e.,  local  selection  and  instance 
connectivities  with  the  neighbor).  In  case  a  node  is  connected  with  more  than  one 
node  involved  in  a  query  graph,  the  IID-size  would  be  the  average  value  of  the  IID- 
sizes  of  all  the  node  pairs.  For  example,  based  on  the  object  graph  of  Figure  6.1, 
when  the  example  query  pattern  is  applied,  the  IID-size  of  RA.hrs=20  is  2.  Varying 
the  value  of  this  parameter  represents  the  effect  of  changing  the  number  of  instances 
of  a  node,  the  selectivity  factor  associated  with  instance  selection  based  on  attribute 
value(s),  and/or  the  connectivity  of  its  instances  with  the  instances  of  the  neighboring 
nodes. 
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Dia:  Dia  or  diameter  stands  for  the  longest  distance  between  any  two  processing 
nodes  to  which  the  object  classes  referenced  in  a  query  are  mapped.  It  is  the  number 
of  nodes  along  the  longest  path  between  the  two  terminal  nodes.  For  example,  in 
our  example  query  shown  in  Figure  3.3,  the  Dia  is  4.  The  Dia  value  represents  the 
distance  of  a  wavefront  propagation,  thus,  determines  its  communication  cost. 

We  now  proceed  to  present  the  optimization  strategies. 

6.2     Intraquerv  Scheduling  Strategy 

In  a  single  query  with  N  object  classes,  if  the  IID-size  value  of  a  particular 
class  is  relatively  large  when  compared  with  the  IID-sizes  of  other  classes,  the  node 
that  manages  it  should  act  as  a  "passive"  node.  A  passive  node  will  not  become 
active  until  it  receives  all  the  wavefronts  from  its  AND  branches.  Only  one  node 
in  a  query  graph  can  act  as  a  "passive"  node  because  a  query  graph  with  *AND 
branches  would  only  propagate  a  wavefront  of  IIDs  after  having  received  all  but 
one  wavefront  of  IIDs.  If  more  than  one  node  acts  as  a  "passive",  a  deadlock  would 
occur.  The  "relatively  large"  can  be  determined  by  using  a  threshold  which  is  a  value 
between  the  largest  IID-size  and  the  smallest  IID-size  that  needs  to  be  determined 
by  a  performance  evaluation  study.  If  no  class  has  an  unusually  large  IID-size,  the 
generic  identification  approach  described  before  is  applied. 

This  strategy  avoids  the  propagation  of  a  large  amount  of  IIDs  while  maintaining 
some  degree  of  parallelism.  Figure  6.2  follows  this  rule.  The  numbers  of  IIDs  to  be 
sent  out  by  the  RA,  Grad,  Section  or  Department  node  are  smaller  than  the  one  to 
be  sent  out  by  the  Student  node  (4,000).  Thus,  the  student  node  should  be  a  passive 
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node.  Otherwise,  a  lot  of  processing  time  will  be  wasted  and  other  concurrent  queries 
will  not  be  able  to  benefit  from  those  nodes  which  heavily  engage  in  processing  the 
incoming  IIDs  and  yet  produce  results  that  do  not  contribute  to  the  final  result  of 
the  query. 

In  the  identification  approach,  a  non-terminal  node  will  wait  until  it  receives 
i-1  wavefronts  from  i  neighbors  that  are  involved  in  an  AND  construct.  We  call  this 
kind  of  node  a  semi-passive  node  as  opposed  to  the  passive  node  we  just  described. 

We  assume  a  node  C  has  i  numbers  of  AND-conditioned  edges  in  the  query  graph 
and  we  call  the  processor  which  contains  the  instances  of  node  C,  the  Pc  processor. 
We  now  describe  the  behaviors  of  the  different  kinds  of  nodes/processors  as  follows: 

•  If  Pc  is  an  active  processor,  Pc  will  start  the  IID  propagation  process.  If  it 
receives  the  stream  of  IIDs  from  its  only  neighbor,  it  will  mark  those  instances 
as  qualified  instances  and  then  terminate. 

•  If  Pc  is  a  semi-passive  processor,  and 

—  if  Pc  processor  has  received  less  than  (i-1)  incoming  streams  of  IIDs  from 
its  neighboring  processors,  it  will  wait  for  more  streams  of  IIDs  to  come; 

—  if  Pc  processor  has  received  streams  of  IIDs  from  all  its  neighbors  but  one, 
it  will  process  the  (i-1)  streams  and  select  those  C  instances  that  satisfy  the 
selection  condition  and  the  query  pattern.  Then,  it  will  send  those  IIDs 
that  are  associated  with  the  instances  of  the  only  remaining  neighboring 
node  to  the  corresponding  processor. 
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—  if  Pc  processor  has  received  the  ith  (i.e.,  the  last)  incoming  stream  of  IIDs, 
it  will  form  the  final  result  of  the  query  for  node  C  and  then  pass  the 
IIDs  that  are  associated  with  the  instances  of  all  other  neighbors  to  these 
neighbors  except  the  sender  of  the  ith  stream. 

•  If  Pc  is  a  passive  processor,  and 

—  if  Pc  processor  has  received  less  than  i  incoming  streams  of  IIDs  from  its 
neighboring  processors,  it  will  wait  for  more  streams  of  IIDs  to  come; 

—  if  Pc  receives  i  incoming  streams  of  IIDs  from  all  its  neighboring  processors, 
it  will  process  those  IIDs  and  form  the  final  result  of  the  query  for  node 
C  (i.e.,  the  set  of  C  instances  that  satisfy  the  query  graph)  and  then  pass 
those  IIDs  in  the  resulting  set  that  are  associated  with  the  instances  of  all 
the  neighbors  to  these  neighbors. 

The  above  strategy  is  a  greedy  approach.  Another  possible  approach  is  the 
randomized  optimization  [Ioan90].  In  the  randomized  optimization  approach,  the 
query  optimizer  randomly  picks  the  Nth  node  as  a  passive  node  and  computes  the  cost 
of  each  query  evaluation  plan  (QEP).  This  process  is  repeated  for  another  node  and 
the  query  optimizer  will  choose  a  cheaper  QEP  as  the  current  QEP.  After  evaluating 
all  possible  QEPs  (or  stopping  at  some  pre-defined  termination  conditions),  the  query 
optimizer  can  have  an  optimal  (or  near  optimal)  QEP.  The  drawback  of  this  approach 
is  that  it  involves  very  complicated  cost  functions  which  are  quite  difficult  to  define 
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and  validate  in  a  parallel  environment.    However,  it  may  provide  a  better  query 
processing  plan  some  of  the  time. 

A  hybrid  approach  can  be  used  to  reduce  the  search  space  and  still  achieve  a 
satisfactory  QEP.  This  approach  picks  the  node  that  has  the  largest  IIDs  size  as  a 
passive  node  and  evaluates  the  cost  of  the  QEP.  Then,  a  node  that  has  the  second 
largest  IID-size  will  be  picked  and  the  cost  of  the  QEP  is  computed.  The  two  costs 
will  be  compared  and  the  cheaper  QEP  is  kept.  This  process  will  repeat  itself  until 
the  termination  condition  is  met.  The  termination  condition  could  be  that  if  there 
have  been  X  number  of  consecutive  nonproductive  attempts  or  all  the  nodes  have 
been  picked,  then  program  terminates.  The  nonproductive  attempt  means  that  the 
cost  of  a  new  QEP  is  greater  than  the  cost  of  the  best  QEP  identified  so  far.  The 
constant  X  is  a  system  parameter.  This  approach  is  based  on  the  same  heuristic  rule 
that  a  node  with  a  large  IID-size  will  most  likely  send  out  large  number  of  IIDs  to 
its  neighboring  nodes,  and  some  of  which  may  not  contribute  to  the  final  result. 

6.3     Partial  Graph  Processing  Strategy 

In  an  OODBMS,  a  query  can  be  expressed  as  a  query  graph  in  which  each  node 
represents  a  class  involved  in  the  query.  In  many  cases,  only  the  descriptive  data  from 
a  limited  number  of  nodes  are  of  interest  and  are  to  be  retrieved  or  processed.  The 
rest  of  the  nodes  in  the  graph  are  used  only  for  determining  the  connectivities  among 
object  instances.  For  example,  in  a  query  "Find  a  graduate  research  assistant's  name 
whose  RA  assignment  is  20  hours",  there  are  three  nodes  involved,  namely,  RA,  Grad 
and  Student.  But  the  query  issuer  is  only  interested  in  retrieving  the  names  from  the 
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Student  node.  For  this  kind  of  query,  the  following  procedure  can  be  used  to  achieve 
a  more  efficient  processing: 

•  Mark  those  nodes  whose  descriptive  data  are  of  interest  as  having  status  1. 

•  Mark  the  nodes  which  are  between  the  status  1  nodes  as  having  status  2. 

•  Mark  the  rest  of  the  nodes  according  to  their  distance  from  the  status  1  or  2 
node.  The  immediate  neighbor  of  a  status  1  or  2  node  will  be  marked  as  having 
status  3. 

•  The  generic  identification  or  elimination  algorithm  is  applied;  however,  IID 
wavefronts  in  some  directions  will  be  suppressed  by  following  the  rules  given 
below. 

t  The  wavefronts  between  status  1  or  2  nodes  are  propagated. 

•  The  wavefronts  from  a  higher  numbered  node  to  a  lower  numbered  node  are 
propagated. 

•  The  wavefronts  from  a  lower  numbered  node  to  a  higher  numbered  node  are 
suppressed. 

Figure  6.3  is  used  to  illustrate  the  numbering  scheme.  In  the  query  graph,  nodes 
B  and  F  contain  the  data  to  be  retrieved.  They  are  marked  as  having  status  1.  Node 
E  is  in  between  nodes  B  and  F  so  it  is  marked  as  having  status  2.  The  rest  of  the 
nodes  are  either  marked  as  having  number  3  or  4  according  to  their  distances  from  a 
number  1  or  2  node. 
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The  purpose  for  marking  nodes  with  ordered  numbers  is  to  distinguish  which 
part  of  the  query  graph  is  of  interest  and  which  direction  is  towards  or  away  from 
the  interested  area.  If  the  wavefront  is  propagated  away  from  the  interested  area,  the 
propagation  can  be  suppressed.  However,  the  end_markers  should  still  be  passed  on 
so  that  every  processor  involved  in  the  processing  of  the  query  graph  can  count  the 
number  of  the  end_markers  it  receives  to  tell  when  the  algorithm  should  be  terminated 
in  that  processor. 

This  second  optimization  strategy  is  applicable  to  both  identification  and  elim- 
ination algorithms.  The  time  saving  of  this  strategy  is  twofold.  First  of  all,  the 
suppression  of  some  IID  propagations  reduces  the  communication  cost.  Secondly,  the 
processors  that  hold  the  classes  that  do  not  contain  the  data  to  be  retrieved  contribute 
only  to  the  identification  of  object  instance  connectivities  (i.e.,  the  propagation  of 
wavefronts  from  higher  numbers  to  lower  numbers),  thus,  leaving  themselves  more 
time  for  processing  some  other  queries. 

The  only  overhead  introduced  by  this  approach  is  to  mark  the  nodes  with  ordered 
numbers.  If  the  query  graph  is  a  tree,  the  cost  of  the  numbering  is  very  small  (a 
variation  of  the  depth-first  search  algorithm  is  used  in  our  implementation).  If  the 
query  graph  is  cyclic,  the  stated  numbering  scheme  can  not  be  directly  apply.  Many 
approaches  have  been  proposed  to  process  cyclic  queries  [Kamb82,  Kamb85,  TayY89]. 
These  methods  first  convert  a  cyclic  query  into  a  tree  and  then  apply  a  tree-structured 
query  processing  procedure  to  find  the  result.  An  approach  proposed  in  [Chen95] 
identifies  the  cyclic  components  in  a  query  graph  first,  and  then  uses  a  combination  of 
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the  identification  and  elimination  algorithms  to  process  the  cyclic  components.  When 
this  procedure  finishes,  the  query  graph  can  be  converted  to  an  acyclic  graph  so  that 
the  two  query  optimization  strategies  as  well  as  the  numbering  scheme  presented  in 
this  section  can  be  applied. 


Figure  6.3.  An  Example  of  the  Numbering  Scheme 

6.4     Interquerv  Scheduling  and  Common  Pattern  Sharing  Strategies 

Our  goal  for  interquery  scheduling  is  to  exploit  the  sharing  of  a  common  pattern. 
Concurrent  queries  can  share  their  processing  results  in  three  ways.  First,  the  final 
processing  result  of  one  query  can  be  used  by  other  queries.  Second,  the  intermediate 
results  of  one  query  can  be  used  by  other  queries.  Third,  some  of  the  costly  opera- 
tions can  be  shared  by  queries  (e.g.,  selection  operation  and  accessing  data  from  the 
secondary  storage).  We  will  discuss  them  in  turn. 
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Sharing  conditions:  There  are  different  criteria  for  sharing.  In  our  approach, 
the  following  two  requirements  have  to  be  met  by  a  query  before  it  can  share  the 
result  of  another  query: 

1.  The  nodes  (object  classes)  and  edges  (their  associations) 
of  a  graph  form  a  superset  of  or  the  same  set  as 

those  of  another  query. 

2.  The  local  selection  conditions  of  the  nodes  (object  classes) 
in  the  query  graph  are  equally  or  more  restrictive  than  the 
ones  specified  in  another  query  graph. 

For  example,  we  have  a  set  of  queries  as  follows: 
Ql:  A[al  >  10]  *  B[bl  =  10]  *  C 
Q2:  A[al  >  10]  *  B[bl  =  10]  *  C  *  E 
Q3:  A[al  =  20]  *  B[bl  =  10]  *  C 
Q4:  A[al  >  10]  *  D  *  F 

According  to  the  two  conditions  given  above,  Q2  and  Q3  can  share  the  final 
result  of  Ql.  Q4  can  not  share  the  final  results  of  the  other  queries  due  to  its 
violation  of  the  condition  1.  However,  local  selection  operation  of  the  class  A  in  Q4 
can  be  shared  with  the  same  operation  of  class  A  in  the  other  queries,  (we  shall 
discuss  this  approach  in  a  greater  detail  in  Subsection  6.5). 

Having  specified  the  conditions  of  sharing,  we  now  examine  the  structures  of 
sharing  that  can  exist  among  queries. 
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Structures  of  Sharing:  There  is  a  need  to  introduce  some  structures  of  sharing 
to  provide  a  clear  and  more  expressive  way  of  describing  the  execution  order  of 
queries  in  a  parallel  or  distributed  environment.  For  example,  the  four  queries  we 
just  discussed  can  have  the  structure  shown  in  Figure  6.4. 

Ql  Q4  Q3(A[al=20]) 


Q2  Q3 

Figure  6.4.  An  Example  Set  of  the  Structures  of  Sharing 

This  set  of  structures  carries  several  meanings.  Firstly,  it  says  that  Ql  will  be 
executed  before  Q2  and  Q3  and  Q2  and  Q3  can  share  the  result  of  Ql.  Secondly, 
it  indicates  that  Q4  can  not  share  with  Ql,  Q2  or  Q3,  however,  the  local  selection 
operations  of  Ql,  Q4  and  Q3(A[al  =  20])  are  sharable.  We  shall  say  that  Ql,  Q2 
and  Q3  in  the  same  structure  of  sharing  and  Q4  by  itself  is  in  another  structure. 
Given  a  set  of  queries,  more  than  one  structure  can  be  constructed.  In  our  example, 
Ql,  Q4  and  Q3(A[al=20])  are  at  the  first  level  (a  higher  level)  while  Q2  and  Q3  are 
at  the  second  level  (a  lower  level).  The  reason  Q3(A[al  =  20])  is  also  placed  at  the 
first  level  is  that  its  local  selection  condition  is  different  from  Ql's  and  the  instances 
of  class  A  obtained  by  processing  Ql  can  not  be  used  for  processing  Q3(A[al=20]). 
However,  by  placing  the  Q3's  selection  condition  over  A  at  the  top  level,  that  selection 
condition  can  be  shared  with  those  of  Ql  and  Q4  because  these  three  queries  will  be 
processed  concurrently. 
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We  note  here  that  in  a  structure  of  sharing,  the  higher  level  query  is 
less  restrictive  than  the  lower  level  queries.  Thus,  the  query  results  of  a 
higher  level  query  is  a  superset  of  the  results  of  a  lower  level  query.  The 

structure  of  sharing  can  be  used  as  the  structure  for  query  scheduling.  Thus,  the 
interquery  scheduling  is  based  on  the  sharing  of  common  subpatterns. 

The  above  example  illustrates  that  a  structure  of  sharing  is  very  convenient  in 
describing  the  execution  order  of  queries  and  the  activation  of  processors  that  manage 
different  classes.  Now,  we  proceed  to  introduce  three  basic  structures  of  sharing. 

Basic  Structures  of  Sharing:  A  complex  structure  consists  of  one  or  more 
than  one  of  the  three  basic  structures  shown  in  Figure  6.5.  Structure  A  is  the  simplest 
one.  Those  nodes  in  Q2  that  have  the  corresponding  nodes  in  Ql  will  wait  until  those 
nodes  in  Ql  finish  processing  (i.e.,  after  receiving  all  the  end_markers).  The  nodes 
in  Q2  will  use  the  query  results  of  those  nodes  in  Ql  as  their  local  selection  results. 
Moreover,  in  the  elimination  algorithm,  the  CON  array  of  a  node  in  Ql  can  also  be 
used  by  the  node  in  Q2  so  that  Q2  will  be  able  to  resume  the  processing  at  the  place 
where  Ql  left,  rather  than  to  start  the  processing  all  over  again.  We  note  here  that 
those  nodes  in  Q2  that  do  not  appear  in  Ql  can  start  the  query  processing  without 
waiting  for  Ql  to  finish.  They  can  also  participate  in  the  sharing  of  distributed  local 
selections  to  be  discussed  later.  Case  AA  is  an  example  of  this  kind  of  structure 
in  which  the  result  of  A*B  can  be  used  for  processing  A*B*C  and  the  processor  of 
node  C  can  start  a  wavefront  algorithm  without  waiting  for  the  processing  of  A*B 
to  complete. 
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(A) 
A'B 


l*C-D         C*D*E 


(C) 
B'C'D 


A'B*C*D"E*F 


A*B*C«D         B«C»D*E 


(AA)  (BB)  <CC> 

Figure  6.5.  Three  Basic  Structures  of  Sharing 
Case  B  describes  the  situation  in  which  a  complex  query  can  share  the  results 
of  several  smaller  queries.  Similarly,  the  nodes  in  Q3  can  use  the  query  results  of  the 
corresponding  nodes  as  their  local  selection  results.  If  one  node  appears  in  both  Ql 
and  Q2,  its  results  from  Ql  and  Q2  will  be  intersected  and  used  by  the  corresponding 
node  in  Q3  as  its  local  selection  results.  However,  when  it  comes  to  the  CON  array 
sharing,  Q3  can  only  use  CON  arrays  either  from  Ql  or  from  Q2  to  avoid  a  possible 
inconsistency.  Case  BB  is  an  example  of  structure  B.  The  results  produced  for  C  and 
D  in  B*C*D  and  C*D*E  processing  will  be  intersected  and  the  result  will  be  used 
as  the  local  selection  results  of  C  and  D,  respectively.  In  the  elimination  approach, 
the  resulting  CON  array  of  either  B*C*D  or  C*D*E  processing  will  be  used  by  the 
corresponding  nodes  in  the  processing  of  A*B*C*D*E*F. 

The  third  basic  structure  is  shown  in  C.  This  case  allows  a  query  pattern  to  be 
shared  by  several  other  queries.  The  query  result  sharing  and  the  CON  array  sharing 
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between  each  pair  of  the  high  level  and  low  level  queries  is  the  same  as  in  Case  A. 
Case  CC  is  an  example  of  structure  C. 

We  shall  now  discuss  in  a  greater  detail  about  what  can  be  shared  in  a  structure 
of  sharing. 

Query  results  sharing:  In  a  structure  of  sharing,  the  result  of  a  higher  level 
query  can  be  used  by  lower  level  queries  as  their  local  selection  results  if  they  have 
the  same  local  selection  conditions  so  that  the  local  selection  operations  of  the  lower 
level  queries  are  not  necessary.  However,  if  the  lower  level  queries  do  not  have  the 
corresponding  object  nodes  in  the  higher  level  query,  the  local  selection  operations 
of  these  nodes  will  have  to  be  done  separately.  For  those  nodes  whose  selection 
conditions  are  more  restrictive  than  those  of  the  corresponding  nodes  in  the  upper 
level,  these  selection  operations  would  have  been  moved  to  the  upper  level.  Their 
results  are  readily  available  for  these  lower  level  selection  operations.  After  the  local 
selections,  a  multiple  wavefront  algorithm  can  start. 

Intermediate  result  sharing:  If  the  elimination  algorithm  is  used,  the  CON 
arrays  of  a  higher  level  query  which  records  some  intermediate  results  can  be  shared 
by  the  lower  level  queries.  The  CON  array  sharing  enables  the  lower  level  queries  to 
start  their  query  processing  from  where  the  higher  level  query  ends  rather  than  from 
the  very  beginning.  This  strategy  will  reduce  the  communications  and  processing 
costs.  Similar  to  the  case  of  query  result  sharing,  if  the  lower  level  queries  can  not 
find  the  corresponding  nodes  from  which  to  copy  the  CON  arrays,  they  will  copy 
them  from  the  database  (i.e.,  the  original  CON  array).   Also,  if  there  is  more  than 
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one  query  at  the  higher  level,  the  structure  of  sharing  should  specify  a  query  from 
which  all  the  lower  level  queries  should  copy  the  CON  arrays.  After  that,  the  CON 
arrays  of  the  source  query  should  be  freed,  thus  making  the  memory  space  available 
to  other  tasks. 

Common  operation  sharing:  In  a  structure  of  sharing,  some  operations  are 
common  to  all  queries,  some  of  which  are  very  time-consuming  operations,  such  as  the 
retrieval  of  the  final  result.  These  common  operations  can  be  shared.  The  two-phase 
query  processing  technique  discussed  in  Section  3  postpones  the  retrieval  of  the  final 
result  to  the  second  phase.  The  common  operation  sharing  strategy  postpones  the 
retrieval  of  the  final  result  even  further,  i.e.,  to  the  end  of  executing  a  structure  of 
queries.  This  approach  takes  advantage  of  the  structural  property  of  such  a  structure, 
i.e.,  the  query  result  of  a  higher  level  query  is  a  superset  of  the  result  of  a  lower  level 
query.  In  processing  a  structure  of  queries,  only  those  data  (attribute  values)  needed 
for  a  high  level  query  are  retrieved  from  the  secondary  storage.  The  lower  level 
queries  access  their  data  from  the  data  already  loaded  in  main  memories  instead  of 
from  secondary  storages.  This  approach  can  greatly  reduce  the  I/O  cost.  However, 
it  may  delay  the  response  time  of  some  individual  queries.  To  solve  this  problem,  a 
pipelining  approach  for  the  construction  of  final  retrieval  results  can  be  used.  The 
final  result  of  a  query  is  constructed  after  object  instances  have  been  traversed  by  slave 
nodes  and  those  instances  that  satisfy  some  local  selection  conditions  and  the  context 
specification  have  been  marked.  The  collection  of  the  final  retrieval  result  can  be  done 
in  the  following  pipelined  fashion.  As  soon  as  a  processor  completes  its  processing  of 
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a  query,  its  data  in  the  form  of  IID-Attribute- Value  pairs,  which  constitute  the  final 
retrieval  result,  are  sent  to  a  processor  responsible  for  constructing  the  final  result. 
Data  would  arrive  to  the  construction  processor  at  different  time  depending  on  the 
time  the  involved  processors  complete  the  processing  of  a  structure  of  sharing,  thus 
forming  a  pipeline  of  data.  The  construction  processor  would  assemble  the  received 
data  based  on  the  IID  information  provided  in  the  data  streams  to  construct  the  final 
retrieval  result. 

6.5     Distributed  Sharing  of  Selection  Operations 

In  order  to  achieve  the  sharing  of  the  results  of  selection  operations,  there  must 
be  some  processor(s)  which  is  responsible  for  the  identification  of  sharable  selection 
conditions.  We  use  a  distributed  approach  to  achieve  this  identification.  In  this  ap- 
proach, all  the  processors  that  contain  object  classes  referenced  by  a  set  of  concurrent 
queries  are  given  the  structures  of  sharing  as  well  as  the  queries.  They  independently 
examine  the  top  level  queries  in  these  structures  (note:  only  the  top  level  queries 
do  selections)  to  determine  if  their  selection  conditions  make  references  to  the  same 
object  classes  in  their  possession.  If  such  selection  conditions  and  object  classes  are 
identified,  the  selection  conditions  are  compared  to  determine  if  they  are  sharable. 
Thus,  the  decision  of  shareability  is  made  in  a  parallel  and  distributed  fashion  in- 
stead of  in  a  centralized  fashion.  The  latter  approach  may  cause  a  bottleneck  in  a 
multi-query  processing  system.  The  tasks  needed  for  sharing  the  results  of  selection 
operations  in  each  processor  would  depend  on  the  storage  (or  access  path)  structure 
used.  For  example,  if  there  is  no  index  established  for  an  attribute,  all  its  attribute 
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values  in  the  instances  of  an  object  class  will  be  accessed  from  the  secondary  storage 
once  and  be  used  to  process  the  selection  conditions  of  all  the  queries  that  make 
reference  to  the  attribute.  However,  if  an  index  is  available,  index  accesses  and  their 
results  can  be  shared.  In  either  way,  the  amount  of  I/O  will  be  reduced. 


CHAPTER  7 
PERFORMANCE  EVALUATIONS 


We  have  implemented  the  two  generic  multi-wavefront  algorithms,  result  collec- 
tion strategies  based  on  both  master-slave  and  peer-to-peer  architectures,  and  four 
optimization  strategies  presented  in  the  previous  sections.  Our  implementation  plat- 
form is  a  64-node  nCUBE  2  parallel  computer  [nCU92].  The  software  architectures 
for  both  architectures  are  shown  in  Figures  7.1. 

7.1     Benchmark  and  Application  Domains 

In  our  evaluation,  the  benchmark  queries  introduced  in  Thakore's  work  [Thak94] 
are  used.  We  did  not  use  the  benchmarks  proposed  in  other  works  [Ande90,  Catt92, 
Care93]  because  they  contain  much  simpler  query  types.  The  query  types  used  are 
shown  in  Figures  7.2.  The  characteristics  of  each  query  type  are  as  follows: 

Type  I-Queries  involve  the  manipulation  of  complex  objects.  Figure  7.2(a)  shows 
the  structure  of  the  subschema  processed  by  the  queries.  Object  class  CI  models  a 
set  of  complex  objects.  Complex  objects  are  composed  of  objects  of  other  classes 
and  are  modeled  as  an  aggregation  hierarchy.  In  the  figure,  objects  of  class  CI  are 
composed  of  objects  of  classes  C2  and  C3. 

Type  II-Queries  involve  the  manipulation  of  complex  objects  and  the  inheritance 
of  attributes.  Figure  7.2(b)  shows  the  structure  of  the  subschema  processed  by  queries 
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of  this  type.  As  can  be  observed  from  the  figure,  in  addition  to  the  manipulation 
of  the  aggregation  hierarchy,  the  inheritance  of  attributes  through  the  generalization 
association  (labeled  G)  is  also  involved  in  query  processing.  The  dynamic  model  of 
inheritance  is  assumed,  meaning  the  attributes  and  values  associated  with  objects  of 
a  superclass  are  defined  and  stored  in  the  superclass  rather  than  in  its  subclasses. 

Type  Ill-Queries  involve  the  interaction  (or  relationship)  of  complex  objects 
with  inheritance  of  attributes.  In  Figure  7.2(c),  classes  CI  and  C4  model  two  sets 
of  complex  objects.  Objects  of  class  CI  inherit  attributes  from  class  C8.  Class  C7 
models  objects  that  capture  the  interaction  between  the  complex  objects  of  classes 
CI  and  04. 

In  addition  to  the  above  benchmark  queries,  query  types  with  one  class  selection, 
two-class  and  three-class  association  (join)  are  also  evaluated. 

In  our  performance  evaluations,  we  conduct  evaluations  of  the  proposed  query 
processing  and  optimization  strategies.  We  compare  the  speedup  and  scaleup  of  the 
two  architectures  for  result  collection.  We  also  evaluate  data  placement  strategies 
over  two  different  application  domains. 

7.2     Evaluations  of  Optimization  Strategies 

Using  the  implemented  system,  we  evaluate  the  performance  of  four  optimiza- 
tion strategies.  We  compare  the  performance  of  query  processing  with  optimization 
against  the  performance  without  optimization.  The  response  time  of  a  query  is  de- 
fined as  the  time  it  takes  for  the  slave  nodes  to  receive  the  query  from  the  master 
node,  process  it  and  construct  the  final  results.   The  total  CPU  time  of  a  query  is 
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the  summation  of  the  processing  times  of  the  slave  nodes  which  are  involved  in  the 
processing  of  the  query  (excluding  the  idle  time).  We  are  able  to  obtain  this  data 
by  using  the  execution  profiling  tool  provided  by  the  nCUBE.  In  constructing  the 
test  database  as  shown  in  Figure  7.2(c).  We  map  one  object  class  to  one  processor 
(class-per-node)  and  each  processor  has  its  own  I/O  channel. 

7.2.1     Intraquerv  Scheduling 

For  the  intraquery  scheduling  strategy,  a  test  query  as  shown  in  Figure  7.2(b) 
is  used.  We  assume  that  each  of  the  object  classes  has  an  IID-size  of  5,000  initially. 
The  IID-size  of  class  C3  varies  from  5,000  to  500  so  that  we  can  observe  the  impact  of 
varying  the  difference  in  IID-sizes  between  class  C3  and  classes  C2  and  C8.  According 
to  the  intraquery  scheduling  strategy,  any  one  of  the  C2  and  C3  classes  can  be  passive. 
Here  we  arbitrarily  pick  class  C3  as  a  passive  class. 

Figure  7.3(a)  shows  the  response  time  for  the  single  query  in  two  situations.  One 
is  with  the  intraquery  scheduling  strategy  and  the  other  without.  We  observe  from 
this  figure  that  when  the  difference  in  the  IID-size  between  Class  C3  and  Classes  C2 
and  C8  is  large  (greater  than  3,000),  the  intraquery  optimization  strategy  improves 
the  response  time.  We  also  observe  from  Figure  7.3(a)  that,  when  the  difference  in 
IID-size  between  the  object  classes  is  not  very  significant,  starting  the  wavefronts 
from  the  active  nodes  with  smaller  IID-sizes  decreases  the  degree  of  parallelism  for 
the  single  query  so  that  the  response  time  is  longer. 

Figure  7.3(b)  shows  the  total  CPU  time  (excluding  the  idle  time)  of  all  the 
slave  nodes  that  are  involved  in  this  query.  This  figure  shows  that,  by  applying  the 
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(a)  A  Master-Slave  Software  Architecture 


(b)  A  Peer-to-Peer  Software  Architecture 


Figure  7.1.  Two  Software  Architectures 
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(a)  Modeling  of  Complex  Ob- 
jects 


(b)  Modeling  of  Complex  Ob- 
jects with  the  Inheritance  of 
Attribute  Values 


(c)  Modeling  of  Interacting  Complex 
Objects  with  the  Inheritance  of  At- 
tribute Values 


Figure  7.2.  Schema  Representation  of  Various  Benchmark  Queries 
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intraquery  scheduling  strategy,  the  total  CPU  time  decreases,  thus  leaving  more  CPU 
time  for  other  queries. 

If  we  only  consider  the  optimization  of  a  single  query,  in  Figure  7.3(a),  we  can 
easily  see  that  the  threshold  value  of  the  difference  in  IID-sizes  is  about  3,000  (or  when 
the  IID-size  of  the  C3  class  is  2,000).  However,  if  our  interest  is  on  the  optimization 
of  multiple  queries,  the  response  time  to  one  query  is  not  the  only  criterion  that 
needs  to  be  considered.  We  need  to  consider  the  cost  involved  in  achieving  that 
kind  of  response  time.  In  other  words,  we  need  to  consider  how  much  processing 
power  is  consumed  by  a  single  query  and  how  much  processing  power  is  left  for 
other  queries.  The  goal  in  parallel  multiple  query  processing  is  to  achieve  a 
balance  between  the  response  time  for  each  individual  query  (parallelism) 
and  the  cost  to  achieve  that  (efficiency)  so  that  the  overall  response  time 
of  a  set  of  concurrent  queries  is  reduced.  With  this  goal  in  mind,  by  combining 
Figures  7.3(a)  and  (b),  we  can  see  that  the  threshold  value  of  the  difference  in  IID-size 
can  be  somewhere  between  0  and  3,000. 
7.2.2     Partial  Graph  Processing 

The  partial  graph  processing  strategy  is  tested  for  the  identification  approach 
using  the  query  graph  shown  in  Figure  7.2(c).  We  randomly  pick  the  object  classes 
from  which  data  are  to  be  retrieved.  The  response  time  and  the  total  cpu  time  of 
the  query  is  shown  in  Figures  7.4(a)  and  (b). 
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Figure  7.3.  Intraquery  Scheduling  Strategy 
Two  observations  can  be  made.  Firstly,  the  greater  the  ratio  of  the  Dia  (as 
defined  in  Section  6.1)  of  a  partial  graph  to  the  Dia  of  the  whole  query  graph  is,  the 
smaller  the  response  time  and  the  more  saving  in  the  total  processing  time  would  be. 
Secondly,  the  lower  the  number  of  classes  from  which  the  final  retrieval  results 
are  to  be  accessed  is,  the  more  saving  in  the  processing  time  would  be.  Also,  if  a 
large  class  does  not  contain  the  descriptive  data  of  interest,  the  response  time  will 
be  significantly  reduced.  The  location  of  the  class  in  the  query  graph  will  affect  the 
response  time  as  well. 

One  can  also  notice,  by  comparing  Figures  7.4(a)  and  (b),  that,  while  the  re- 
sponse time  for  a  single  query  may  not  be  significantly  reduced  by  applying  the 
partial  graph  strategy,  the  total  CPU  time  does.  This  is  because  the  response  time  is 
determined  by  the  time  (both  idle  and  processing  time)  of  the  last  slave  node  which 
finishes  the  processing  of  a  query.  In  order  to  evaluate  a  query  processing  strategy  in 
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Figure  7.4.  Partial  Graph  Processing  Strategy 

the  context  of  multi-query  processing,  both  the  response  time  of  the  query  and  the 
total  CPU  time  should  be  considered. 

7.2.3     Interquerv  Scheduling  and  Common  Pattern  Sharing 

Our  test  database  is  still  the  database  shown  in  Figure  7.2(c).  We  construct  a 
structure  of  sharing  which  consists  of  three  basic  structures.  We  gradually  increase 
the  number  of  queries  in  the  structure  while  we  measure  the  total  response  time  at 
each  step  until  it  consists  of  five  queries  as  shown  in  Figure  7.5. 

The  performance  evaluation  is  done  for  both  identification  and  elimination  ap- 
proaches. We  only  present  the  results  of  the  identification  approach  here  because 
the  results  of  the  elimination  approach  are  similar.  Figures  7.6(a)  and  (b)  show  the 
performance  results  when  both  the  result  sharing  and  the  sharing  of  result  retrieval 
operation  (e.g.,  result  collection  phase)  are  applied.  As  expected,  the  performance  is 
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further  improved.  We  also  note  here  that  the  more  queries  are  added  to  the  sharing 
structure,  the  more  performance  gain  is  achieved. 

7.2.4     Distributed  Local  Selection  Sharing 

The  same  performance  evaluation  method  used  in  the  preceding  subsection  to 
evaluate  the  performance  of  the  interquery  scheduling  and  the  common  pattern  shar- 
ing is  used  here.  The  test  database  and  the  structure  of  sharing  are  the  same. 
However,  we  do  not  use  the  result  sharing  and  the  result  retrieval  sharing  but  use 
only  the  distributed  local  selection  sharing  strategy  available.  Figure  7.7  shows  the 
performance  evaluation  result.  We  observe  the  same  kind  of  performance  improve- 
ment as  in  the  last  subsection.  The  increased  number  of  the  queries  shown  in  the 
figures  represent  the  increase  in  the  sharing  of  local  selection  operations. 


Figure  7.5.  Structure  of  Sharing  Used  in  Performance  Evaluation 
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Figure  7.6.  Interquery  Scheduling  and  Common  Pattern  Sharing  Strategies 
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Figure  7.7.  Distributed  Local  Selection  Sharing  Strategy 
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7.2,5     Scaleup  and  Speedup  of  optimization  strategies 

We  also  evaluate  the  scaleup  and  speedup  of  the  multiple  wavefront  algorithms 
and  the  optimization  strategies.  In  our  evaluation,  the  benchmark  queries  shown  in 
Figures  7.2  are  used. 

Many  application  domains  can  be  formed  by  varying  the  percentages  of  the 
three  benchmark  query  types.  In  this  dissertation,  only  the  evaluation  result  of  one 
application  domain  is  presented.  This  application  domain  has  the  same  percentage 
for  the  three  benchmark  query  types. 

Figure  7.8(a)  shows  the  scaleup  of  the  multiple  wavefront  algorithms  and  the 
optimization  strategies.  Several  conclusions  can  be  reached  from  the  figure.  Firstly, 
when  the  number  of  the  processors  is  much  smaller  than  the  number  of  object  classes 
in  the  schema,  a  reasonably  good  scalability  can  be  achieved.  This  is  because  each 
processor  stores  instances  of  multiple  classes  and  multiple  queries  access  data  from 
different  classes,  thus  achieving  some  degree  of  load  balancing.  When  the  number  of 
the  processors  is  further  increased,  the  scalability  deteriorates  because  the  additional 
processors  do  not  lighten  the  processing  load  of  these  processors  which  hold  the 
instances  of  object  classes  due  to  the  class-per-node  mapping  strategy  used  in  this 
particular  experiment.  Secondly,  the  optimization  strategies  improve  the  scalability. 

Figure  7.8(b)  shows  the  speedup  of  the  multiple  wavefront  algorithms  and  the 
optimization  strategies.  From  the  figure,  we  observe  a  quite  good  speedup  when  the 
number  of  processors  is  much  smaller  than  the  number  of  the  object  classes  in  a 
schema.  When  the  number  of  the  processors  is  increased,  the  speedup  deteriorates. 
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The  reason  is  the  same  as  what  was  explained  for  the  scalability.  We  can  also  observe 
that  the  algorithms  without  optimization  strategies  have  better  speedup  than  the 
ones  with  optimization  strategies.  The  reason  is  that,  when  many  object  classes 
are  mapped  to  a  processor,  it  is  more  likely  that  the  optimization  strategies  can  be 
applied  more  effectively  and  the  execution  time  of  the  queries  can  be  reduced  more 
significantly.  Thus,  when  the  number  of  processors  is  increased,  the  execution  time 
can  not  be  reduced  as  much  as  the  case  when  the  optimization  strategies  are  not 
applied.  For  the  scaleup  evaluation,  the  optimization  strategies  can  be  applied  more 
effectively  because  the  problem  size  grows  with  the  system  size,  thus,  achieving  a 
better  scalability. 


Number  of  Processors 


Number  of  Processors 


(a)  Scaleup  of  the  System 


(b)  Speedup  of  the  System 


Figure  7.8.  Scaleup  and  Speedup  of  the  System 

7.3     Evaluations  of  Architectures 
In  our  research,  we  evaluate  the  speedup  and  scaleup  of  the  two  architectures. 


In  the  master-slave  architecture,  node  0  is  designated  as  a  master  node  and  the  rest 
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of  the  nodes  as  slave  nodes.  In  the  peer-to-peer  architecture,  node  0  is  designated  as 
a  C-node  and  the  rest  of  the  nodes  as  P-nodes.  The  results  are  collected  by  different 
P-nodes.  We  also  use  the  hybrid  data  placement  strategy  for  the  evaluations  in  this 
subsection. 

7.3.1     Single  Class  Selection 

The  speedup  and  scaleup  of  a  single  class  selection  are  shown  in  Figures  7.9.  It 
can  be  observed  that  both  architectures  have  good  speedup  and  scaleup.  In  our  exper- 
iment, we  also  change  the  size  of  classes  and  the  attribute  size  of  objects.  We  found 
that  the  speedup  and  scaleup  properties  do  not  change  significantly  in  either  case. 
However,  if  the  size  of  a  class  is  very  small  (e.g.,  50  objects/class)  or  the  attribute 
size  of  an  object  is  small  (e.g.,  5  bytes/object),  the  speedup  and  scaleup  are  not  good. 
This  is  because  when  each  data  segment  is  less  or  equal  to  the  page  size  (4096  bytes 
in  our  implementation),  further  partition  will  not  improve  the  performance. 


Number  of  Processors 


(a)  Speedup 
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(b)  Scaleup 


Figure  7.9.  Single  Class  Selection  (20000  objects/class,  100  bytes/object) 
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7.3.2     Two-Class  Join 

The  speedup  and  scaleup  of  a  two-class  join  are  measured  on  three  different 
kinds  of  databases.  As  shown  in  Figure  7.10,  the  peer-to-peer  architecture  achieves 
better  speedup  and  scaleup  than  the  master-slave  architecture.  We  observe  this 
characteristics  for  all  types  of  queries  we  tested.  We  also  observe  from  Figure  7.11 
that,  in  a  peer-to-peer  architecture,  when  the  number  of  objects  in  a  class  is  increased 
to  from  4000  objects  per  class  to  20000  objects  per  class,  the  speedup  and  scaleup 
are  degraded.  This  is  because  that  the  increase  in  the  number  of  objects  increases 
the  message  transmission  overhead.  This  portion  of  the  cost  can  not  be  completely 
parallelized.  On  the  contrary,  when  the  attribute  size  of  an  object  is  increased  from 
100  bytes/object  to  5000  bytes/object,  the  speedup  and  scaleup  improve.  This  is 
because  that  the  increase  of  the  attribute  size  increases  the  I/O  cost  which  is  the 
portion  of  the  cost  that  can  be  parallelized.  Similar  characteristics  can  be  observed 
in  a  master-slave  architecture.  The  variation  of  an  object  class  size  or  an  attribute 
size  has  similar  impact  on  a  three-class  join  and  the  benchmark  queries. 

7.3.3     Three-Class  Join 

The  speedup  and  scaleup  of  the  three-class  join  is  shown  in  Figure  7.13.  Com- 
pared with  the  two-class  join,  the  speedup  and  scaleup  are  not  as  good.  This  is 
because  that  three-class  join  increases  the  message  overhead  which  can  not  be  fully 
parallelized. 
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Figure  7.10.  Two-class  Join  (4000  objects/class,  100  bytes/object) 
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Figure  7.11.  Two-class  Join  (20000  objects/class,  100  bytes/object) 
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Figure  7.12.  Two-class  Join  (4000  objects/class,  5000  bytes/object) 
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Figure  7.13.  Three-class  Join  (4000  objects/class,  100  bytes/object) 
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7.3,4     Benchmark  Queries 

Finally,  we  evaluate  the  scaleup  and  speedup  of  the  benchmark  queries  shown 
in  Figures  7.2.  The  comparison  is  made  between  the  master-slave  and  peer-to-peer 
architectures.  We  observe  that  the  peer-to-peer  architecture  achieves  much  better 
speedup  and  scaleup  than  the  master-slave  architecture.  This  is  because  that  the 
queries  are  much  more  complicated  and  the  result  collection  and  the  traversal  cost 
at  the  master  node  become  the  bottleneck. 
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Figure  7.14.  Speedup  and  Scaleup  of  Benchmark  Queries 


7.4     Evaluations  of  Two  Data  Placement  Strategies 

In  this  section,  we  present  the  evaluations  of  the  placement  strategies  over  two 
different  application  domains.  The  benchmark  query  types  shown  in  Figure  7.2  are 
used.  In  the  first  application  domain,  we  assume  that  there  are  many  object  classes 
each  of  which  contains  a  small  number  of  object  instances.  The  queries  in  this  appli- 
cation domain  involve  all  the  object  classes  with  an  equal  or  nearly  equal  probability. 
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To  simulate  this  application  domain,  our  benchmark  query  set  contains  one  Type 
I  query,  one  Type  II  query  and  eight  Type  III  queries.  There  are  100  objects  in 
each  class.  In  the  second  application  domain,  there  are  some  large  size  classes  and 
the  queries  involves  the  large  size  classes  with  a  higher  probability.  To  simulate  this 
application  domain,  we  let  CI,  C2  and  C3  have  20000  objects/class  while  the  rest 
classes  have  4000  objects/class.  Our  benchmark  query  set  contains  one  Type  III 
query,  one  Type  II  query  and  eight  Type  I  queries. 

Two  data  placement  strategies  are  under  evaluations,  namely,  the  class-per-node 
vertical  partitioning  and  the  hybrid  partitioning  strategies.  In  a  class-per-node  ver- 
tical partitioning  strategy,  the  instances  of  a  class  can  be  stored  in  a  single  processor 
and  each  processor  can  have  the  instances  of  multiple  classes.  At  each  processor, 
the  instances  of  an  object  class  are  partitioned  vertically.  In  a  hybrid  partitioning 
strategy,  each  object  is  horizontally  partitioned  into  segments  and  each  segment  can 
be  further  partitioned  vertically.  As  explained  in  Section  4.2.1  on  query  modification, 
a  query  is  modified  into  another  query  based  on  the  distribution  of  different  combi- 
nations of  horizontally  partitioned  data.  The  modified  query  is  processed  against  a 
combination  of  data  partitions  using  various  optimization  techniques  to  obtain  the 
final  result  of  the  original  query. 

We  measure  the  response  time  for  both  data  placement  strategies  in  two  different 
application  domains.  In  the  class-per-node  vertical  partitioning  strategy,  maximally 
eight  processors  can  be  utilized.  Therefore,  we  limit  the  processor  number  to  eight  for 
both  strategies.  Figure  7.15(a)  shows  the  performance  result  in  the  first  application 
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domain.  In  this  domain,  the  class-per-node  vertical  partitioning  strategy  performs 
better.  This  is  because  that  the  benchmark  query  set  involves  object  classes  with 
nearly  equal  probability,  thus,  the  processing  power  of  processors  can  be  fully  utilized. 
On  the  contrary,  the  hybrid  partitioning  strategy  does  not  utilize  the  processing  power 
of  the  processors  any  better  than  the  class-per-node  partitioning,  yet,  it  requires  some 
synchronization  at  "OR"  branches  which  introduce  some  overheads. 

Figure  7.15(b)  shows  the  performance  result  in  the  second  application  domain. 
In  this  domain,  the  hybrid  partitioning  strategy  performs  better.  This  is  because  that 
the  benchmark  query  set  involves  object  classes  with  different  probability  and  the 
sizes  of  classes  are  different.  When  the  class-per-node  vertical  partitioning  strategy 
is  used,  there  are  some  nodes  which  may  finish  query  processing  well  ahead  the  other 
nodes  and  become  idle.  But,  when  hybrid  partitioning  strategy  is  used,  every  class  is 
partitioned  into  eight  segments  and  mapped  to  eight  nodes.  In  this  way,  the  modified 
queries  can  fully  utilize  all  the  processing  power.  Even  though  the  modified  query 
introduce  some  overheads  because  of  the  synchronization,  the  overall  response  time 
is  improved. 
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Figure  7.15.  Response  Time  of  Benchmark  Queries 


CHAPTER  8 
DISCUSSION  AND  CONCLUSION 


8.1     Discussion 

Whenever  new  optimization  strategies  are  introduced,  the  overhead  of  using 
these  strategies  need  to  be  considered  since  it  is  a  part  of  query  processing  cost. 
Also,  in  multiple  query  optimization,  the  memory  size  can  become  a  limitation  to  the 
applicability  of  these  strategies.  We  address  these  issues  below. 

Overhead  of  the  Intraquery  Scheduling  Strategy:  The  overhead  of  this 
strategy  is  the  identification  of  passive,  semi-passive  and  active  nodes.  The  IID-size 
is  the  only  parameter  that  needs  to  be  considered  for  this  purpose.  We  have  defined 
the  IID-size  for  each  object  class  as  the  number  of  IIDs  that  need  to  be  sent  to  its 
neighbors  after  its  local  processing.  A  processor  calculates  the  IID-size  based  only  on 
the  local  information  (e.g.,  CON  array,  instance  connectivities  to  adjacent  neighbors 
and  the  distribution  functions  of  the  attributes)  instead  of  the  entire  query  graph. 
This  calculation  is  an  in-memory  operation,  and  the  traversal  of  the  entire  query 
graph  is  not  necessary  in  the  calculation. 

Overhead  of  the  Partial  Graph  Processing  Strategy:  The  numbering  of 
nodes  in  a  query  graph  is  the  overhead  for  this  strategy.  We  have  pointed  out  in  the 
previous  section  that  the  numbering  process  is  a  search  process  on  a  query  graph. 
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Usually,  the  number  of  classes  involved  in  a  query  graph  is  not  very  large.  Thus,  the 
overhead  would  be  small. 

Overhead  of  the  Interquery  Scheduling  and  Common  Pattern  Sharing: 
Building  the  structures  of  sharing  is  the  overhead  of  this  multiple  query  optimization 
strategy.  It  takes  two  steps  to  build  the  structures  for  a  batch  of  queries.  In  the  first 
step,  each  query  will  compare  with  every  other  query  in  the  batch  to  find  out  if  they 
can  form  a  Case  A  (as  discussed  in  Section  6.4)  simple  structure.  The  complexity  of 
this  operation  is  0(n?)  where  n  is  the  number  of  queries  in  the  batch.  In  the  second 
step,  those  simple  structures  are  grouped  to  form  one  or  more  complex  structures. 
The  total  complexity  is  still  0(n2).  Because  of  the  memory  limitation  (to  be  discussed 
below),  there  will  be  a  limited  number  of  queries  that  can  be  batched  so  that  the 
overhead  for  building  the  structures  of  sharing  would  not  be  too  significant. 

Overhead  of  the  Distributed  Local  Selection  Sharing:  In  this  strategy, 
all  the  decisions  on  sharing  are  made  in  the  slave  nodes  so  that  there  is  no  bottleneck 
problem  for  the  master  node.  The  overhead  is  in  identifying  the  selection  predicates 
and  deciding  the  order  of  executing  the  selection  operations. 

We  have  run  different  sets  of  queries  to  determine  the  overheads  of  query  opti- 
mization and  the  performance  gains  in  response  time.  We  did  this  for  both  identifi- 
cation and  elimination  approaches.  The  test  database  is  the  university  database  we 
presented  in  Figure  3.1.  A  set  of  queries  are  used,  each  of  which  has  4  to  7  object 
classes.  Figures  8.1  shows  the  results  for  the  identification  approach.  We  can  see 
that  the  overhead  is  very  small  when  compared  with  the  gain  in  the  total  response 
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time.  For  example,  when  there  are  14  concurrent  queries,  the  overhead  is  0.073  sec- 
onds while  the  performance  gain  is  121.49  -  67.28  =  54.21  seconds.  We  note  here 
that  the  overhead  does  not  include  the  IID-size  estimation  because  according  to  the 
research  results  reported  in  Sun's  work  [SunW93],  attribute  distribution  functions 
can  be  computed  in  advance  and  the  use  of  these  functions  at  run-time  to  determine 
IID-sizes  generates  negligible  overhead. 
Identification  Approach 
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Figure  8.1.  Optimization  Overhead  for  the  Identification  Approach 


Memory  Limitation:  The  main  memory  size  of  the  processors  is  another 
factor  which  needs  to  be  considered  in  multiple  query  processing  and  optimization. 
In  the  traditional  relational  database  processing,  queries  generate  final  or  temporary 
results  in  the  form  of  relations.  If  one  query  can  make  use  of  the  final  or  temporary 
result  produced  by  another  query  (e.g.,  the  result  of  a  selection  operation),  the  final  or 
temporary  relation  generated  by  the  latter  should  ideally  be  kept  in  the  main  memory 
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so  that  it  can  be  readily  used  by  the  former  without  extra  I/Os.  However,  this  is  not 
always  possible  since  the  generated  relation  may  be  too  large  to  be  stored  in  the  main 
memory.  In  the  storage  structure  and  query  processing  strategy  used  in  our  system, 
only  the  vertical  binary  columns  (IID-IID  and  IID-attribute-value  pairs)  that  are 
relevant  to  query  processing  are  fetched  from  the  secondary  storage  and  the  result  of 
a  selection  operation  or  the  processing  of  a  wavefront  is  a  set  of  IIDs  which  does  not 
occupy  much  main  memory  space.  Therefore,  they  can  be  kept  in  the  main  memory 
for  use  by  other  queries.  Also,  during  the  result  collection  phase  (i.e.,  the  second 
phase  of  query  processing),  descriptive  data  relevant  to  a  set  of  retrieval  queries  can 
be  fetched  by  the  slave  processors  from  their  corresponding  secondary  storages  once 
and  in  parallel,  and  be  distributed  into  different  memory  buffers  which  are  set  up 
for  these  queries.  When  the  buffers  are  full,  they  can  be  transferred  to  the  result 
collection  node  which  would  assembly  the  data  to  produce  the  final  instances  for  the 
users.  A  proper  buffering  scheme  can  keep  the  pipelines  of  data  flowing  smoothly 
from  the  slave  nodes  to  the  master  node. 

8.2     Conclusion 

In  this  dissertation,  we  have  provided  the  rationale  for  using  the  proposed  paral- 
lel architectures,  data  placement  strategies,  query  processing  and  optimization  strate- 
gies and  result  collection  strategies  for  the  storage  and  processing  of  object-oriented 
databases.  These  strategies  are  different  from  those  used  in  many  existing  parallel 
relational  database  systems  in  the  following  ways.  Firstly,  hybrid  partitioning  of  ob- 
ject instances  is  used  instead  of  the  popular  horizontal  partitioning  scheme  to  achieve 


a  high  scalability  and  avoid  the  access  of  large  instances  of  complex  objects  from  the 
secondary  storages.  Secondly,  a  query  modification  scheme  is  used  instead  of  the 
"split",  "merge"  or  "exchange"  operators  to  achieve  a  more  uniform  interprocessor 
communication  during  query  processing.  Thirdly,  the  two-phase  query  processing 
strategy  and  the  marking  of  object  instances,  instead  of  the  traditional  single-phase 
strategy  and  the  generation  of  temporary  relations,  avoid  the  propagation  of  large 
quantities  of  data  among  processors,  fourthly,  the  graph-based  query  specification 
and  processing  strategy  instead  of  the  traditional  tree-based  strategy  can  offer  a 
higher  degree  of  parallelism  since  query  processing  can  start  at  multiple  processors 
and  in  multiple  directions  instead  of  the  fixed  leaves-to-root  order.  Fifthly,  the  multi- 
wavefront  algorithms  instead  of  the  algebra-  and  tree-based  processing  algorithms  are 
used  to  allow  a  more  direct  implementation  and  processing  of  query  graphs.  Sixthly, 
the  query  optimization  strategies  introduced  in  this  dissertation  control  the  multiple 
initializations  of  wavefronts  and  the  directions  of  their  propagations  and  allow  queries 
to  share  their  processing  results  in  a  variety  of  ways.  They  are  particularly  suitable  for 
multi-wavefront  algorithms  and  the  graph-based  query  processing  strategy.  Lastly, 
the  distributed  result  collection  scheme  achieves  good  speedup  and  scaleup  for  the 
retrieval-type  queries. 

In  this  work,  we  have  implemented  the  above  strategies  and  evaluated  their 
performance.  We  have  shown  that  not  only  they  are  implementable  but  also  their 
use  improves  the  performance  of  multi-query  processing  with  negligible  overheads. 
We  do  not  claim  that  the  proposed  strategies  are  better  than  the  more  traditional 
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strategies  used  in  many  existing  parallel  database  systems.  A  comparison  between 
these  two  sets  of  strategies  would  involve  the  actual  implementation  of  both  sets 
and  run  them  on  a  parallel  system  to  get  their  precise  performance  measurements. 
However,  this  task  would  be  too  laborious  to  undertake  due  to  the  fact  that  there  are 
many  existing  strategies  and  variations.  Any  selection  of  a  subset  of  these  strategies 
for  comparison  purposes  would  subject  to  a  criticism  on  the  fairness  of  the  result, 
particularly  different  techniques  and  algorithms  are  bound  to  be  used  to  implement 
them.  However,  we  do  suggest  that,  due  to  the  different  characteristics  of  object- 
oriented  databases,  researchers  in  parallel  database  systems  should  investigate  into 
different  query  processing  and  optimization  strategies.  The  ones  described  in  this 
dissertation  are  some  examples. 
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